Sift on gpu the slides are not updated for newer versions of siftgpu
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

SIFT on GPU ( the slides are not updated for newer versions of SiftGPU ) PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

SIFT on GPU ( the slides are not updated for newer versions of SiftGPU ). Changchang Wu 5/8/2007. Outline. Background and related implementation SIFT on GPU (SiftGPU) Goal: fast, general, flexible Conclusion and Future Work. SIFT (Lowe, IJCV04). Scale Invariant Feature Transform

Download Presentation

SIFT on GPU ( the slides are not updated for newer versions of SiftGPU )

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Sift on gpu the slides are not updated for newer versions of siftgpu

SIFT on GPU (the slides are not updated for newer versions of SiftGPU)

Changchang Wu




  • Background and related implementation

  • SIFT on GPU (SiftGPU)

    • Goal: fast, general, flexible

  • Conclusion and Future Work

Sift lowe ijcv04

SIFT (Lowe, IJCV04)

  • Scale Invariant Feature Transform

    • Detect and describe features that are invariant to similarity transformation

  • Popular technique in computer vision

    • Panorama Generation

    • Microsoft Photosynth

    • Content Based Image Retrieval

Sift lowe ijcv041

SIFT (Lowe, IJCV04)

  • Scale-space extrema detection

    • Difference-of-Gaussian function

    • A close approximate of scale normalized Laplacian of Gaussian , more stable than gradient, Hessian, or Harris corner function.

    • Maximum and minimum of DOG are Invariant to scale change

Scale space construction

Scale-space Construction

σdoubles for the next octave, just resample

For each octave: s Intervals, k=21/s,s+3 Gaussian images

Finding local extrema

Finding Local Extrema

DOG space

Comparing a pixel (marked with X) to its 26 neighbors in 3x3 regions at the current and adjacent scales (marked

with circles).

Also assign orientations to keypoints using the maxima in local gradient orientation histogram (In a window of size 3*sigma)



  • Sub-pixel localization

    • Fitting 3D quadratic function in the 3x3x3 cube to find sub-pixel location

  • Edge elimination

Feature descriptor

Feature descriptor

  • Select Gaussian image at expected scale

  • Compute weighted histogram of gradient orientation (relative to keypoint orientations)

128D vector =16 squares x 8(directions)

Existing implementations

Existing implementations

  • CPU version

    • Lowe’s binary (

    • Andrea Vedaldi’s SIFT++

    • (

    • C#(autopanosift), Matlab…

  • GPU version

    • Sudipta Sinha’s GPUSIFT

    • Sebastian Heymann’s

Current progress

Current Progress

  • Intensity conversion and sampling (cg + GLSL)

  • Image pyramid (cg + GLSL)

  • keypoint detection (cg + GLSL)

  • Sub-pixel localization (none)

  • Edge elimination (cg only)

  • Feature List generation (cg + GLSL+CPU)

  • Orientation (cg fp40 only)

  • Display List generation (cg + GLSL)

  • Descriptor generation (cg)

  • Visualization (cg + GLSL, Glut+win32)

  • + means multiple versions of implementations

Scale space construction1

Scale Space Construction

  • Run horizontal and vertical Gaussian filtering separately

    • When # of DOG level in an octave is 3, the largest Gaussian kernel can be 19x19

  • Compute difference of Gaussian in the same pass since it is already read out

  • Didn’t use Ping-pong, sometimes write and read same texture, because not all channels need to be changed.

Color channel mapping

Color channel mapping

  • Use Texture from Destination instead of PingPong

Keypoint detection

Keypoint Detection

  • Compare with 26 neighbors?

  • Do in 4 steps

    • Intra-level comparing with 8 neighbors, (compute gradient in this pass, and edge elimination)

    • Store the maximum and minimum of the 9 pixels in an auxiliary texture

    • Early z culling based on the in-level suppression

    • Comparing with the maximum and minimum of the pixel at upper level and lower level

Feature list generation on gpu

Feature List Generation on GPU

  • Use Gernot Ziegler’s histogram pyramid method. Use all RGBA chanels

  • Do reduction, and read back the highest level.

  • Allocate texture to hold the feature list

  • Traverse the pyramid to get location

Feature orientation

Feature Orientation

  • Use a circular window (use 3*sigma as radius)

  • Compute weighted histogram of orientations (36 bins as 9 float4)

    Binary search to locate desired bin


  • Smoothing the histogram

    smoothing kernel can easily be large

    one (1 3 6 7 6 3 1 )/27 as three (1 1 1)/3

  • next

Feature orientation1

Feature Orientation

  • Find the bins that are

    • larger than 0.8 times the maximum

    • Local maximum

    • Do interpolation to get sub-bin orientation

  • Save the largest N<=4M to RGBA of M output textures

  • Save N to the original texture, and set N to 0 when N is larger than a threshold

Reshape feature list

Reshape Feature List

  • Rebuild the feature list according to orientations (variable # of orientations)

  • Use the histogram pyramid method

Feature descriptor1

Feature Descriptor

  • Use 4 textures for MRT, and 8 RGBA pixels in each. (8*4*4 = 128)

  • Trilinear interpolation is implemented

A better Geometry Shader

Version is in Progress

Sift on gpu the slides are not updated for newer versions of siftgpu

Use 2*sigma ( instead of 6*sigma ) as box size to display here

Display vbo generation

Display VBO generation

  • Display SIFT features as rotated/scaled square to illustrate scale and orientation.

  • Say feature texture is WxH (normally H is 1, because no more than 2048..)

  • Make a texture that is Wx(4H)

    • For point (x, y), the index is Idx=y*W+x

    • Then original index is idxo=Idx/4

    • And sub-index is fmod(idx,4), and use sub-index to offset and rotate this point

  • Copy render result to VBO (vertex buffer object)



  • This SIFT on GPU also tries to give flexibility by providing parameters

    • # of octaves, # of levels, sigma0

    • Starting octave, starting level

    • Filter window size

    • Orientation window size

    • Descriptor window size

  • Shaders are dynamically generated



  • Speed on nVidia 8800

  • 13 Hz on a 640*480 image

  • 4 Hz on a 2048*1536 image

  • Part can run on laptop

    • Raedon X300 (Maximum instruction is 96)

    • No orientation/Edge elimination/Descriptor



  • Very close to sift++

  • Finished a basic and also flexible framework of SIFT

  • Reduced CPU/GPU data transfer by feature list generation on GPU

Future work

Future work

  • Sub-pixel localization

  • Try Geometry Shader or CUDA for descriptor generation

  • Try the packed texture format of Sebastian Heymann’s implementation

  • Compatibility with more Graphic Cards

  • Login