1 / 49

# Scale-Invariant Feature Transform (SIFT) - PowerPoint PPT Presentation

Scale-Invariant Feature Transform (SIFT). Jinxiang Chai. Review. Image Processing - Median filtering - Bilateral filtering - Edge detection - Corner detection. Review: Corner Detection. 1. Compute image gradients

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Scale-Invariant Feature Transform (SIFT)' - gene

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Scale-Invariant Feature Transform (SIFT)

Jinxiang Chai

Image Processing

-Median filtering

- Bilateral filtering

- Edge detection

- Corner detection

2. Construct the matrix from it and its neighborhood values

3. Determine the 2 eigenvalues λ(i.j)= [λ1, λ2].

4. If both λ1 and λ2 are big, we have a corner

Corners are detected where both λ1 and λ2 are big

• What are we looking for?

• Strong features

• Invariant to changes (affine and perspective/occlusion)

• Solve the problem of correspondence

• Locate an object in multiple images (i.e. in video)

• Track the path of the object, infer 3D structures, object and camera movement,

• Choosing features that are invariant to image scaling and rotation

• Also, partially invariant to changes in illumination and 3D camera viewpoint

• Illumination

• Scale

• Rotation

• Affine

• Object recognition from local scale-invariant features [pdf link], ICCV 09

• David G. Lowe, "Distinctive image features from scale-invariant keypoints,"International Journal of Computer Vision, 60, 2 (2004), pp. 91-110

• Earlier Methods

• Harris corner detector

• Sensitive to changes in image scale

• Finds locations in image with large gradients in two directions

• No method was fully affine invariant

• Although the SIFT approach is not fully invariant it allows for considerable affine change

• SIFT also allows for changes in 3D viewpoint

• Scale-space extrema detection

• Keypoint localization

• Orientation Assignment

• Generation of keypoint descriptors.

• Different scales are appropriate for describing different objects in the image, and we may not know the correct scale/size ahead of time.

• Looking for features (locations) that are stable (invariant) across all possible scale changes

• use a continuous function of scale (scale space)

• Which scale-space kernel will we use?

• The Gaussian Function

• variable-scale Gaussian

• input image

• variable-scale Gaussian

• input image

• To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function

• variable-scale Gaussian

• input image

• To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function

• variable-scale Gaussian

• input image

• To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function

• Look familiar?

• variable-scale Gaussian

• input image

• To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function

• Look familiar?

-bandpass filter!

• A = Convolve image with vertical and horizontal 1D Gaussians, σ=sqrt(2)

• B = Convolve A with vertical and horizontal 1D Gaussians, σ=sqrt(2)

• DOG (Difference of Gaussian) = A – B

• So how to deal with different scales?

• A = Convolve image with vertical and horizontal 1D Gaussians, σ=sqrt(2)

• B = Convolve A with vertical and horizontal 1D Gaussians, σ=sqrt(2)

• DOG (Difference of Gaussian) = A – B

• Downsample B with bilinear interpolation with pixel spacing of 1.5 (linear combination of 4 adjacent pixels)

A1

Difference of Gaussian Pyramid

A3-B3

Blur

B3

DOG3

A3

Downsample

A2-B2

B2

Blur

DOG2

A2

Input Image

Downsample

A1-B1

Blur

DOG1

Blur

• Initial smoothing ignores highest spatial frequencies of images

• Initial smoothing ignores highest spatial frequencies of images

- expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid

• Initial smoothing ignores highest spatial frequencies of images

- expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid

• How to do downsampling with bilinear interpolations?

Weighted sum of four neighboring pixels

x

u

y

v

y

Sampling at S(x,y):

(i,j)

(i,j+1)

u

x

v

(i+1,j+1)

(i+1,j)

S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j)

+ (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1)

y

Sampling at S(x,y):

(i,j)

(i,j+1)

u

x

v

(i+1,j+1)

(i+1,j)

S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j)

+ (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1)

To optimize the above, do the following

Si = S(i,j) + a*(S(i,j+1)-S(i))

Sj = S(i+1,j) + a*(S(i+1,j+1)-S(i+1,j))

S(x,y) = Si+b*(Sj-Si)

y

(i,j)

(i,j+1)

x

(i+1,j+1)

(i+1,j)

A3

DOG3

B3

A2

B2

DOG3

A1

B1

DOG1

• Find maxima and minima of scale space

• For each point on a DOG level:

• Compare to 8 neighbors at same level

• If max/min, identify corresponding point at pyramid level below

• Determine if the corresponding point is max/min of its 8 neighbors

• If so, repeat at pyramid level above

• Repeat for each DOG level

• Those that remain are key points

DOG L+1

DOG L

DOG L-1

• For all levels, use the “A” smoothed image to compute

• Remove all key points with MIJ less than 0.1 times the max gradient value

• Motivation: Low contrast is generally less reliable than high for feature points

• For each remaining key point:

• Choose surrounding N x N window at DOG level it was detected

DOG image

• For all levels, use the “A” smoothed image to compute

+

Gaussian Smoothed Image

• Gradient magnitude weighted by 2D gaussian

=

*

2D Gaussian

Weighted Magnitude

• Accumulate in histogram based on orientation

• Histogram has 36 bins with 10° increments

Weighted Magnitude

Sum of Weighted Magnitudes

• Identify peak and assign orientation and sum of magnitude to key point

*

Peak

Weighted Magnitude

Sum of Weighted Magnitudes

• Difference-of-Gaussian function will be strong along edges

• So how can we get rid of these edges?

• Difference-of-Gaussian function will be strong along edges

• Similar to Harris corner detector

• We are not concerned about actual values of eigenvalue, just the ratio of the two

• SIFT keys each assigned:

• Location

• Scale (analogous to level it was detected)

• Orientation (assigned in previous canonical orientation steps)

• Now: Describe local image region invariant to the above transformations

For each key point:

• Identify 8x8 neighborhood (from DOG level it was detected)

• Align orientation to x-axis

• Calculate gradient magnitude and orientation map

• Weight by Gaussian

• Calculate histogram of each 4x4 region. 8 bins for gradient orientation. Tally weighted gradient magnitude.

• This histogram array is the image descriptor. (Example here is vector, length 8*4=32. Best suggestion: 128 vector for 16x16 neighborhood)

• Find all key points identified in source and target image

• Each key point will have 2d location, scale and orientation, as well as invariant descriptor vector

• For each key point in source image, search corresponding SIFT features in target image.

• Find the transformation between two images using epipolar geometry constraints or affine transformation.

Feature detection

• Image matching via nearest neighbor search

• - if the ratio of closest distance to 2nd closest distance greater than 0.8 then reject as a false match.

• Remove outliers using epipolar line constraints.

• SIFT features are reasonably invariant to rotation, scaling, and illumination changes.

• We can use them for image matching and object recognition among other things.

• Efficient on-line matching and recognition can be performed in real time