1 / 18

Finding Body Parts with Vector Processing

Finding Body Parts with Vector Processing. Cynthia Bruyns Bryan Feldman CS 252. Introduction. Take existing algorithm for tracking human motion, speed up by computing on the GPU. Demonstrate that many vision algorithms are prime candidates for using vector processing. Demo.

bruis
Download Presentation

Finding Body Parts with Vector Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Body Parts with Vector Processing Cynthia Bruyns Bryan Feldman CS 252

  2. Introduction • Take existing algorithm for tracking human motion, speed up by computing on the GPU. • Demonstrate that many vision algorithms are prime candidates for using vector processing

  3. Demo Results after false candidates have been removed

  4. Vision Algorithms • Often computationally expensive-searching over many pixels for objects at many orientations and scales • E.g. • [((1024x768)pix)x3colors]x[12orientations]x[5 scales] • Very often the case that highly parallizable

  5. Limb Finding • Goal – find candidate limbs • Limbs look like long dark rectangles on light backgrounds or long light things on dark backgrounds

  6. x Algorithm specifics 1. Convolution with filter convolve using FFT • Response indicates how much pixels go from low to high intensity • Convolve over all three color channels so as to not miss red – blue of same intensity *

  7. x x respconv x resplimb -respconv Algorithm specifics 2. For every pixel location get respconv from “left” and “right”, put into new matrix resplimb x

  8. .75 .98 .98 .98 .75 .98 .98 .98 .78 .98 .98 .98 .78 .87 .23 .23 Algorithm specifics 3. Find local maximums – for every pixel replace with max. of local neighbors. If resplimb=locMax it’s a max .50 .25 .40 .23 .75 .41 .98 .75 .11 .43 .15 .23 .78 .34 .13 .15 resplimb locMax

  9. GPU • It’s a good choice because each operation is per pixel – SIMD-like • Data stored in texture buffers equivalent to local cache • Clean instruction set and developing interface language to exploit vector operations • Justify your gaming habits

  10. Framebuffer FramebufferOperations FragmentProcessor VertexProcessor Assembly &Rasterization Application Textures GPU dataflow model • Hardware supports several data types for bandwidth optimization, i.e. 32 bit floating point, half etc. • Data passed to main memory stages via binding

  11. Fragment processor has high resource limits • 1024 instructions • 512 constants or uniform parameters • Each constant counts as one instruction • 16 texture units • Reuse as many times as desired • No branching • But, can do a lot with condition codes • No indexed reads from registers • Use texture reads instead • No memory writes

  12. Image Cylinder Program Convolution Program Find Max Program For each orientation to search FFT Fragment program Mask FFT Fragment program The algorithm • Draw invokes the fragment programs • The texture becomes a data structure – use two for framebuffers to avoid RAW hazzards

  13. Results • Mask size fixed (22x13) vary image size (CPU-2.53 GHz P4 GPU Nvidia FX5900) *Additional GPU optimizations possible

  14. Results – log scale • Mask size fixed (22x13) vary image size 252.1 sec 42.7 sec (CPU-2.53 GHz P4 GPU Nvidia FX5900) *Additional GPU optimizations possible

  15. Results • Image size fixed (512x512) vary mask size Varying mask sizes allow for varying limb sizes on same image

  16. Results

  17. Comments • GPU and image processing are a good match • Time to move memory from CPU to GPU is cumbersome – but can be overcome • Non-uniformity of installations, products, exact specifications are hearsay

  18. Acknowledgements • Kenneth Moreland • Deva Ramanan • Okan Arikan

More Related