- 100 Views
- Uploaded on
- Presentation posted in: General

Scale-Invariant Feature Transform (SIFT)

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Scale-Invariant Feature Transform (SIFT)

Jinxiang Chai

Image Processing

-Median filtering

- Bilateral filtering

- Edge detection

- Corner detection

1. Compute image gradients

2. Construct the matrix from it and its neighborhood values

3. Determine the 2 eigenvalues λ(i.j)= [λ1, λ2].

4. If both λ1 and λ2 are big, we have a corner

Corners are detected where both λ1 and λ2 are big

- What are we looking for?
- Strong features
- Invariant to changes (affine and perspective/occlusion)
- Solve the problem of correspondence
- Locate an object in multiple images (i.e. in video)
- Track the path of the object, infer 3D structures, object and camera movement,

- Choosing features that are invariant to image scaling and rotation
- Also, partially invariant to changes in illumination and 3D camera viewpoint

- Illumination
- Scale
- Rotation
- Affine

- Object recognition from local scale-invariant features [pdf link], ICCV 09
- David G. Lowe, "Distinctive image features from scale-invariant keypoints,"International Journal of Computer Vision, 60, 2 (2004), pp. 91-110

- Earlier Methods
- Harris corner detector
- Sensitive to changes in image scale
- Finds locations in image with large gradients in two directions

- No method was fully affine invariant
- Although the SIFT approach is not fully invariant it allows for considerable affine change
- SIFT also allows for changes in 3D viewpoint

- Harris corner detector

- Scale-space extrema detection
- Keypoint localization
- Orientation Assignment
- Generation of keypoint descriptors.

- Different scales are appropriate for describing different objects in the image, and we may not know the correct scale/size ahead of time.

- Looking for features (locations) that are stable (invariant) across all possible scale changes
- use a continuous function of scale (scale space)

- Which scale-space kernel will we use?
- The Gaussian Function

- variable-scale Gaussian
- input image

- variable-scale Gaussian
- input image

- variable-scale Gaussian
- input image

- variable-scale Gaussian
- input image

Look familiar?

- variable-scale Gaussian
- input image

Look familiar?

-bandpass filter!

- A = Convolve image with vertical and horizontal 1D Gaussians, σ=sqrt(2)
- B = Convolve A with vertical and horizontal 1D Gaussians, σ=sqrt(2)
- DOG (Difference of Gaussian) = A – B
- So how to deal with different scales?

- A = Convolve image with vertical and horizontal 1D Gaussians, σ=sqrt(2)
- B = Convolve A with vertical and horizontal 1D Gaussians, σ=sqrt(2)
- DOG (Difference of Gaussian) = A – B
- Downsample B with bilinear interpolation with pixel spacing of 1.5 (linear combination of 4 adjacent pixels)

B1

A1

A3-B3

Blur

B3

DOG3

A3

Downsample

A2-B2

B2

Blur

DOG2

A2

Input Image

Downsample

A1-B1

Blur

DOG1

Blur

- Initial smoothing ignores highest spatial frequencies of images

- Initial smoothing ignores highest spatial frequencies of images
- expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid

- Initial smoothing ignores highest spatial frequencies of images
- expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid

- How to do downsampling with bilinear interpolations?

Weighted sum of four neighboring pixels

x

u

y

v

y

Sampling at S(x,y):

(i,j)

(i,j+1)

u

x

v

(i+1,j+1)

(i+1,j)

S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j)

+ (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1)

y

Sampling at S(x,y):

(i,j)

(i,j+1)

u

x

v

(i+1,j+1)

(i+1,j)

S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j)

+ (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1)

To optimize the above, do the following

Si = S(i,j) + a*(S(i,j+1)-S(i))

Sj = S(i+1,j) + a*(S(i+1,j+1)-S(i+1,j))

S(x,y) = Si+b*(Sj-Si)

y

(i,j)

(i,j+1)

x

(i+1,j+1)

(i+1,j)

A3

DOG3

B3

A2

B2

DOG3

A1

B1

DOG1

- Find maxima and minima of scale space
- For each point on a DOG level:
- Compare to 8 neighbors at same level
- If max/min, identify corresponding point at pyramid level below
- Determine if the corresponding point is max/min of its 8 neighbors
- If so, repeat at pyramid level above

- Repeat for each DOG level
- Those that remain are key points

DOG L+1

DOG L

DOG L-1

- For all levels, use the “A” smoothed image to compute
- Gradient Magnitude

- Threshold gradient magnitudes:
- Remove all key points with MIJ less than 0.1 times the max gradient value

- Motivation: Low contrast is generally less reliable than high for feature points

- For each remaining key point:
- Choose surrounding N x N window at DOG level it was detected

DOG image

- For all levels, use the “A” smoothed image to compute
- Gradient Orientation

+

Gradient Orientation

Gradient Magnitude

Gaussian Smoothed Image

- Gradient magnitude weighted by 2D gaussian

=

*

Gradient Magnitude

2D Gaussian

Weighted Magnitude

- Accumulate in histogram based on orientation
- Histogram has 36 bins with 10° increments

Weighted Magnitude

Sum of Weighted Magnitudes

Gradient Orientation

Gradient Orientation

- Identify peak and assign orientation and sum of magnitude to key point

*

Peak

Weighted Magnitude

Sum of Weighted Magnitudes

Gradient Orientation

Gradient Orientation

- Difference-of-Gaussian function will be strong along edges
- So how can we get rid of these edges?

- Difference-of-Gaussian function will be strong along edges
- Similar to Harris corner detector
- We are not concerned about actual values of eigenvalue, just the ratio of the two

- SIFT keys each assigned:
- Location
- Scale (analogous to level it was detected)
- Orientation (assigned in previous canonical orientation steps)

- Now: Describe local image region invariant to the above transformations

For each key point:

- Identify 8x8 neighborhood (from DOG level it was detected)
- Align orientation to x-axis

- Calculate gradient magnitude and orientation map
- Weight by Gaussian

- Calculate histogram of each 4x4 region. 8 bins for gradient orientation. Tally weighted gradient magnitude.

- This histogram array is the image descriptor. (Example here is vector, length 8*4=32. Best suggestion: 128 vector for 16x16 neighborhood)

- Find all key points identified in source and target image
- Each key point will have 2d location, scale and orientation, as well as invariant descriptor vector

- For each key point in source image, search corresponding SIFT features in target image.
- Find the transformation between two images using epipolar geometry constraints or affine transformation.

Feature detection

- Image matching via nearest neighbor search
- - if the ratio of closest distance to 2nd closest distance greater than 0.8 then reject as a false match.
- Remove outliers using epipolar line constraints.

- SIFT features are reasonably invariant to rotation, scaling, and illumination changes.
- We can use them for image matching and object recognition among other things.
- Efficient on-line matching and recognition can be performed in real time