CSCE 643 Computer Vision: Extractions of Image Features

CSCE 643 Computer Vision:Extractions of Image Features Jinxiang Chai

Good Image Features • What are we looking for? • Strong features • Invariant to changes (affine and perspective, occlusion, illumination, etc.)

Feature Extraction Why do we need to detect features? - Features correspond to important points in both the world and image spaces - Object detection/recognition - Solve the problem of correspondence • Locate an object in multiple images (i.e. in video) • Track the path of the object, infer 3D structures, object and camera movement

Outline Image Features - Corner detection - SIFT extraction

What are Corners? Point features

What are Corners? Point features Where two edges come together Where the image gradient has significant components in the x and y direction We will establish corners from the gradient rather than the edge images

Basic Ideas What are gradients along x and y directions?

Basic Ideas What are gradients along x and y directions? How to measure corners based on the gradient images?

Basic Ideas What are gradients along x and y directions? How to measure corners based on the gradient images? How to measure corners based on the gradient images? - two major axes in the local window!

How to Find Two Major Axes? • Principal component analysis (PCA)

How to Find Two Major Axes? • Principal component analysis (PCA) The length of two major axes is dependent on the ration of eigen values (λ1/λ2 ).

Corner Detection Algorithm 1. Compute the image gradients 2. Define a neighborhood size as an area of interest around each pixel 3x3 neighborhood

Corner Detection Algorithm (cont’d) • For each image pixel (i,j), construct the following matrix from it and its neighborhood values e.g. Similar to covariance matrix (Ix,Iy)T!

Corner Detection Algorithm (cont’d) • For each matrix C(i,j), determine the 2 eigenvalues λ(i.j)=[λ1, λ2]. Simple case: • This means dominant gradient direction aligns with x or y axis. • If either λ1 or λ2 is close to zero, then this is not a corner.

Corner Detection Algorithm (cont’d) • For each matrix C(i,j), determine the 2 eigenvalues λ(i.j)=[λ1, λ2]. Simple case: Interior Region Edge Corner Isolated pixels Large λ1 and small λ2 Large λ1 and large λ2 small λ1 and small λ2 λ1, λ2=0

Corner Detection Algorithm (cont’d) • For each matrix C(i,j), determine the 2 eigenvalues λ(i.j)=[λ1, λ2]. General case: • This is just a rotated version of the one on last slide • If either λ1 or λ2 is close to zero, then this is not a corner. • invariant to 2D rotation

Eigen-values and Corner • λ1 is large • λ2 is large

Eigen-values and Corner • λ1 is large • λ2 is small

Eigen-values and Corner • λ1 is small • λ2 is small

Corner Detection Algorithm (cont’d) • For each matrix C(i,j), determine the 2 eigenvalues λ(i.j)=[λ1, λ2]. 5. If both λ1 and λ2 are big, we have a corner (Harris also checks the ratio of λs is not too high) ISSUE: The corners obtained will be a function of the threshold !

Image Gradients

Image Gradients Closeup of image orientation at each pixel

The Orientation Field Corners are detected where both λ1 and λ2 are big

Corner Detection Sample Results Threshold=25,000 Threshold=10,000 Threshold=5,000

Outline Image Features - Corner detection - SIFT extraction

Scale Invariant Feature Transform (SIFT) • Choosing features that are invariant to image scaling and rotation • Also, partially invariant to changes in illumination and 3D camera viewpoint

Motivation for SIFT • Earlier Methods • Harris corner detector • Sensitive to changes in image scale • Finds locations in image with large gradients in two directions • No method was fully affine invariant • Although the SIFT approach is not fully invariant it allows for considerable affine change • SIFT also allows for changes in 3D viewpoint

Invariance • Illumination • Scale • Rotation • Affine

Readings • Object recognition from local scale-invariant features [pdf link], ICCV 09 • David G. Lowe, "Distinctive image features from scale-invariant keypoints,"International Journal of Computer Vision, 60, 2 (2004), pp. 91-110

SIFT Algorithm Overview • Scale-space extrema detection • Keypoint localization • Orientation Assignment • Generation of keypoint descriptors.

Scale Space • Different scales are appropriate for describing different objects in the image, and we may not know the correct scale/size ahead of time.

Scale space (Cont.) • Looking for features (locations) that are stable (invariant) across all possible scale changes • use a continuous function of scale (scale space) • Which scale-space kernel will we use? • The Gaussian Function

Scale-Space of Image • variable-scale Gaussian • input image

Scale-Space of Image • variable-scale Gaussian • input image • To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function

Scale-Space of Image • variable-scale Gaussian • input image • To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function Look familiar?

Scale-Space of Image • variable-scale Gaussian • input image • To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function Look familiar? -bandpass filter!

Difference of Gaussian • A = Convolve image with vertical and horizontal 1D Gaussians, σ=sqrt(2) • B = Convolve A with vertical and horizontal 1D Gaussians, σ=sqrt(2) • DOG (Difference of Gaussian) = A – B • So how to deal with different scales?

Difference of Gaussian • A = Convolve image with vertical and horizontal 1D Gaussians, σ=sqrt(2) • B = Convolve A with vertical and horizontal 1D Gaussians, σ=sqrt(2) • DOG (Difference of Gaussian) = A – B • Downsample B with bilinear interpolation with pixel spacing of 1.5 (linear combination of 4 adjacent pixels)

B1 A1 Difference of Gaussian Pyramid A3-B3 Blur B3 DOG3 A3 Downsample A2-B2 B2 Blur DOG2 A2 Input Image Downsample A1-B1 Blur DOG1 Blur

Other issues • Initial smoothing ignores highest spatial frequencies of images

Other issues • Initial smoothing ignores highest spatial frequencies of images - expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid

Other issues • Initial smoothing ignores highest spatial frequencies of images - expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid • How to do downsampling with bilinear interpolations?

Bilinear Filter Weighted sum of four neighboring pixels x u y v

Bilinear Filter y Sampling at S(x,y): (i,j) (i,j+1) u x v (i+1,j+1) (i+1,j) S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j) + (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1)

Bilinear Filter y Sampling at S(x,y): (i,j) (i,j+1) u x v (i+1,j+1) (i+1,j) S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j) + (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1) To optimize the above, do the following Si = S(i,j) + a*(S(i,j+1)-S(i)) Sj = S(i+1,j) + a*(S(i+1,j+1)-S(i+1,j)) S(x,y) = Si+b*(Sj-Si)

Bilinear Filter y (i,j) (i,j+1) x (i+1,j+1) (i+1,j)

Pyramid Example A3 DOG3 B3 A2 B2 DOG3 A1 B1 DOG1

CSCE 643 Computer Vision: Extractions of Image Features