Download
by aniruddha marathe n.
Skip this Video
Loading SlideShow in 5 Seconds..
Digital Image Processing With GPU PowerPoint Presentation
Download Presentation
Digital Image Processing With GPU

Digital Image Processing With GPU

149 Views Download Presentation
Download Presentation

Digital Image Processing With GPU

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. By: Aniruddha Marathe Digital Image Processing With GPU

  2. What should you expect to from this presentation? • What’s the motivation? What’s a GPU? • The GPU Pipeline Agenda • Programming the GPU • Performance • Applications

  3. A Talk centered on the Architecture of underlying hardware rather than the Algorithms that run on them. What Should You Expect From This Presentation?

  4. Image Processing Algorithms: • Are involved with large volumes of specific types of data, • Need high computational power (possibly parallel), • Demand real-time processing requirements (in most applications) • These needs can’t be fulfilled by a CPU What’s the motivation?

  5. What’s a GPU? • GPU – Graphical Processing Unit • A Specialized Co-Processor • Very Efficient For • Fast Parallel Floating Point Processing • Single Instruction Multiple Data Operations • High Computation per Memory Access • Not As Efficient For • Double Precision • Logical Operations on Integer Data • Branching-Intensive Operations • Random Access, Memory-Intensive Operations

  6. What’s a GPU? • Dedicated graphics rendering device: • Personal computer, server, game console, mobile device. • GPU chips: • 90%: integrated on motherboard (low end), • 10%: add-on video card (low to high end). • Memory: • Dedicated Video RAM, • Shared System RAM

  7. GPU: Designed for? • As an Image rendering device: • Highly parallel processor • High bandwidth memory • Advanced rendering Capabilities: • Multi-texturing effects. • Realistic lights and shadows effects. • Post processing visual effects.  Originally in consumer PCs for gaming.

  8. Some Definitions • Vertex • A data structure for a point in a mesh, containing position, normal and texture coordinates • Fragment • A pixel, possibly sub-pixel, of a rasterized image • Shaders • Small programs run in the GPU at specific stages of the GPU pipeline

  9. GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer

  10. GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer

  11. Program/ API GPU pipeline • Program • Your Program • API • Either OpenGL or DirectX Interface

  12. GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer

  13. GPU pipeline Driver • Driver • Black-box • Implementations are Company Secrets • Largest Bottleneck in many GPU programs

  14. GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer

  15. GPU pipeline GPU Front End • GPU Front End • Receives commands & data from driver • Communication bridge between the CPU and the GPU • Pulls geometry information from system memory • Outputs a stream of vertices in object space with all their associated information (normals, texture coordinates, per vertex color etc) • PCI Express Bushelps at this stage

  16. GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer

  17. GPU pipeline Vertex Processing • Vertex Processing • Receives vertices from the GPU Front End in object space and outputs them in screen space • No new vertices are created in this stage, and no vertices are discarded (input/output has 1:1 mapping) • Normals, texcoords etc are also transformed • Programmable Data for Rasterization POSITION Vertex PSIZE Vertex Processor POSITION, NORMAL, BINORMAL*, TANGENT*, TEXCOORD[0-7], COLOR[0-1], PSIZE FOG Data for Interpolation Shader TEXCOORD[0-7] COLOR[0-1] textures

  18. GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer

  19. Primitive Assembly GPU pipeline • Primitive Assembly • Compiles Vertices into Points, Lines and/or Polygons

  20. GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer

  21. GPU pipeline Rasterization & Interpolation • Rasterization • Determines respective area of triangle or other primitive for each fragment • Interpolation Primitive Assembler Primitive Type data for rasterization POSITION Rasterizer rasterized data PSIZE DEPTH Barycentric Coordinates FOG TEXCOORD[0-7] COLOR[0-1] Interpolator TEXCOORD[0-7] COLOR[0-1] interpolated data data for interpolation

  22. GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Raster Operations Framebuffer

  23. GPU pipeline Fragment Processing • Fragment Processing • Programmable data for raster operations with texture and lighting information rasterized data Fragment Processor DEPTH COLOR[0-3] TEXCOORD[0-7] COLOR[0-1] DEPTH shader interpolated data textures

  24. GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer

  25. GPU pipeline Raster Operations • Depth Checking • Check framebuffer to see if lesser depth already exists (Z-Buffer) • Limited Programmability • Blending • Use alpha channel to combine colors already in the framebuffer • Limited Programmability

  26. Example Program/ API Code Snippet (OpenGL) …. glBegin(GL_TRIANGLES); glTexCoord2f(1,0); glVertex3f(0,1,0); glTexCoord2f(0,1); glVertex3f(-1,-1,0); glTexCoord2f(0,0); glVertex3f(1,-1,0); glEnd(); … Driver Bus GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  27. Example GPU Program/ API Driver Bus GPU Front End 01001001100…. Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  28. Example Program/ API Driver Bus GPU Front End Vertex Processing viewing frustum Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  29. Example Program/ API Driver Bus GPU Front End Vertex Processing screen space Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  30. Example Program/ API Driver Bus GPU Front End Vertex Processing framebuffer Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  31. Example Program/ API Driver Bus GPU Front End Vertex Processing framebuffer Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  32. SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP TF TF TF TF TF TF TF TF L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L2 L2 FB FB FB FB FB FB Broader View Application Application Vertex assembly Data Assembler Setup / Rstr / ZCull Vtx Thread Issue Prim Thread Issue Frag Thread Issue Vertex operations Primitive assembly Thread Processor Primitive operations Rasterization Fragment operations Frame Buffer NVIDIA GeForce 8800 OpenGL Pipeline

  33. SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP TF TF TF TF TF TF TF TF L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L2 L2 FB FB FB FB FB FB Fixed-function assembly processors Correspondence (By Color) Application-programmable parallel processor Application Application Vertex assembly this was missing Data Assembler Setup / Rstr / ZCull Vtx Thread Issue Prim Thread Issue Frag Thread Issue Vertex operations Primitive assembly Thread Processor Primitive operations Fixed-function framebuffer operations Rasterization(fragment assembly) Fragment operations Framebuffer NVIDIA GeForce 8800 OpenGL Pipeline

  34. Streaming Processors, Texture Units, and On-chip Caches

  35. Modern GPU has more ALU’s

  36. NVIDIA G80 GPU Architecture Overview • 16 Multiprocessors Blocks • Each Block Has: • 8 Streaming Processors • 16K Shared Memory • 64K Constant Cache • 8K Texture Cache • Shared Memory: 2 cycle latency • Device Memory: 300 cycle latency

  37. Programmability in the GPU • In a simplified view, three programmable stages: • Vertex Engine • Fragment Engine • Texture Load/Filter Engine

  38. Programmability in the GPU • For non-graphics applications, two programmable blocks running serially: • Vertex Processor • Fragment Processor

  39. Programmability in the GPU • Both Vertex and Fragment Processors • Support FP32 operands and intermediate values. • Use Texture unit as a random-access data fetch unit at 35 GB/sec. • The programmer can write programs that are executed for every vertex as well as for every fragment • This allows fully customizable geometry and shading effects that go well beyond the generic look and feel of older 3D applications

  40. NVIDIA - CUDA • CUDA – ‘Compute Unified Device Architecture’ – a Parallel Computing Architecture developed by NVIDIA. • NVIDIA provides a GPU processing library for programming the GeForce 8800 GPUs. • ‘C’ Style programming.

  41. Time For Some Applications!

  42. Fast De-noising of Images - 1

  43. Fast De-noising of Images - 2

  44. Fast Border Recognition (From GPU4Vision)

  45. Performance

  46. The NVIDIA G80 GPU • 128 streaming floating point processors @1.5Ghz. • 1.5 Gb Shared RAM with 86Gb/s bandwidth • 320 GFLOPS on one chip (single precision)

  47. NVidia G80 GPU Vs. Intel Core 2 Duo