GPU Architecture & Cg - PowerPoint PPT Presentation

emily
gpu architecture cg l.
Skip this Video
Loading SlideShow in 5 Seconds..
GPU Architecture & Cg PowerPoint Presentation
Download Presentation
GPU Architecture & Cg

play fullscreen
1 / 39
Download Presentation
GPU Architecture & Cg
707 Views
Download Presentation

GPU Architecture & Cg

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. GPU Architecture & Cg Mark ColbertPhD CandidateUCF Graphics Group / MCL colbert@cs.ucf.edu © 2006 University of Central Florida

  2. welcome • Assumptions • Experienced in C\C++ • Basic OpenGL\DirectX Knowledge • Some Graphics Knowledge • Familiarity with Geometric Transformations • Linear Algebra • Purpose • Introduction to the Programmable GPU • Extremely Fast Paced • For the Geeky at Heart

  3. overview • GPU Architecture • GPU Pipeline • Introduction to Cg • Implications for GPGPU

  4. GPU • Graphics Processing Unit • Parallelized SIMD Architecture • Denoted as Pipes • 24 fragment pipes on nVidia 7800 • Each Pipe Handles 4 Vector Operations

  5. rules of the game • Not a Generalized Vector Processor • Cannot read and write to same areas of memory • Limited output capability • Currently, very expensive to output to locations arbitrary locations in memory

  6. notation • Vertex • A data structure for a point in a mesh, containing position, normal, texture coordinates and more… • Fragment • A pixel, possibly sub-pixel, of a rasterized image • Shaders • Small programs ran in the GPU at specific stages of the GPU pipeline

  7. memory constructs • Buffered Objects • Uniform Registers/State Table • Interpolated Registers • Temporary Registers • Textures

  8. memory constructs • Buffered Objects • CPU Generated Streams of Data • Limited Modifiability • Example • Vertex Data of a Mesh

  9. memory constructs • Uniform Registers/State Table • Constant Data through the Pipeline • Only Necessarily Constant for 1 Polygon • 32 general purpose registers • State Table Specific Registers • Projection/Model View Matrices • Lights • … and more

  10. memory constructs • Interpolated Registers • Per Vertex Data of a Polygon • Stores Information Interpolated Across Polygon • 10 General Purpose Interpolated Registers

  11. memory constructs • Temporary Registers • Standard Notion of Registers • Temporary Registers for In Shader Calculations

  12. memory constructs • Textures • Closest to Random Access Memory • Expensive to Access • Multiple Dependent AccessesExtremely Expensive

  13. GPU pipeline Program/ API Driver CPU Bus GPU GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer

  14. Program/ API GPU pipeline • Program • Your Program • API • Either OpenGL or DirectX Interface

  15. Driver GPU pipeline • Driver • Black-box • Implementations are Company Secrets • Largest Bottleneck in many GPU programs

  16. GPU Front End GPU pipeline • GPU Front End • Receives commands & data from driver • PCI Express helps at this stage

  17. Vertex Processing GPU pipeline • Vertex Processing • Normally performs transformations • Programmable data for rasterization POSITION vertex POSITION, NORMAL, BINORMAL*, TANGENT*, TEXCOORD[0-7], COLOR[0-1], PSIZE PSIZE Vertex Processor FOG TEXCOORD[0-7] COLOR[0-1] shader data for interpolation textures

  18. Primitive Assembly GPU pipeline • Primitive Assembly • Compiles Vertices into Points, Lines and/or Polygons • Link elements and set rasterizer

  19. Rasterization & Interpolation GPU pipeline • Rasterization • For each fragment determine respective area of triangle (Barycentric Coordinates) or other primitive • Interpolation Primitive Assembler Primitive Type data for rasterization POSITION Rasterizer rasterized data PSIZE DEPTH Barycentric Coordinates FOG TEXCOORD[0-7] COLOR[0-1] Interpolator TEXCOORD[0-7] COLOR[0-1] interpolated data data for interpolation

  20. Fragment Processing GPU pipeline • Fragment Processing • Programmable rasterized data Fragment Processor data for raster ops DEPTH COLOR[0-3] TEXCOORD[0-7] COLOR[0-1] DEPTH shader interpolated data textures

  21. Raster Operations GPU pipeline • Depth Checking • Check framebuffer to see if lesser depth already exists (Z-Buffer) • Limited Programmability • Blending • Use alpha channel to combine colors already in the framebuffer • Limited Programmability

  22. example Program/ API Code Snippet …. glBegin(GL_TRIANGLES); glTexCoord2f(1,0); glVertex3f(0,1,0); glTexCoord2f(0,1); glVertex3f(-1,-1,0); glTexCoord2f(0,0); glVertex3f(1,-1,0); glEnd(); … Driver Bus GPU Front End Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  23. GPU example Program/ API Driver Bus GPU Front End 01001001100…. Vertex Processing Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  24. example Program/ API Driver Bus GPU Front End Vertex Processing viewing frustum Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  25. example Program/ API Driver Bus GPU Front End Vertex Processing screen space Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  26. example Program/ API Driver Bus GPU Front End Vertex Processing framebuffer Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  27. example Program/ API Driver Bus GPU Front End Vertex Processing framebuffer Primitive Assembly Rasterization & Interpolation Fragment Processing Raster Operations Framebuffer(s)

  28. quick architecture notes • Limits in Shader Size • Pixel Shader 3.0 Spec • Vertex Program – 65535 asm instructions • Fragment Program – 65535+ asm instructions • MIMD • Branches are supported with a large overhead • Rasterizer & Interpolator • Programmable in DirectX 10 • Geometric Shaders • Unified Shading Architecture • Xbox 360 – ATI • Pool of processors with load balancing

  29. higher level shading languages • Vectorized languages for designing shader programs • Easy way out of tedious assembly coding • Not Perfect • Results Are Sometimes Clearly Not Optimized • Examples • Cg • GLSL • HLSL

  30. Cg • nVidia’s Solution • Nearly Identical to HLSL • C++ Based • New Intrinsic Classes • New Intrinsic Functions • Semantics

  31. Cg • Intrinsic Classes • Vectorized Primitives • i.e. float2, float3, float4 • 16-bit Floating Point Constructs • half, half2, half3, half4 • not enabled in ARB shaders • Fixed Precision Decimals • fixed, fixed2, fixed3, fixed4 • Not enabled in ARB shaders

  32. Cg • Intrinsic Classes (cont’d) • Membership Access • Constructor • e.g. float4 v = float4(a,b,c,d); • Array Operator • e.g. v[0], v[1], v[2], or v[3] • Swizzle Operator • Re-order/Build Vectors • e.g. v.xyz, v.xxxz, v.yyx, v.yx, v.xyzw • Replaceable with rgba instead of xyzw

  33. Cg • Intrinsic Classes (cont’d) • Matrices • Compounded Vector Classes • e.g. float4x4 • Constructed with multiple vectors • float4 v = float4(a,b,c,d); float4x4 m = float4x4(v,v,v,v); • Samplers • Texture Data Type • sampler1D, sampler2D, samplerRECT, sampler3D • samplerRECT – Same as sampler2D but uses pixel locations as texture coordinates instead of from [0,1]

  34. Cg • Intrinsic Functions • Many have direct correspondence to assembly instructions or good approximations • Linear Algebra Functions • dot(a,b) – Dot Product • mul – Matrix-Matrix, Vector-Matrix, or Matrix-Vector multiplication • Texture Lookup Functions • tex*(sampler* texture, float* texCoord) • * - The dimensionality of the texture

  35. Cg • Intrinsic Functions (cont’d) • Geometric Intrinsic • distance, faceforward, length, normalize, reflect, refract • A good chunk of math.h • Most Taylor series expansions for two coefficients

  36. Cg • Semantics • Binds variables to GPU Memory Constructs • Uniform Registers • In declaration, use keyword uniform in front of variable type • Vertex Data/Interpolated Registers • float* varName : SEMANTIC • Only used as main function parameteror global variable • Textures • Same as uniform variable

  37. FX Composer • Program for quick shader design • Uses Cg as underlying shading language • Additional Semantic Bindings • NOTE: Uses DirectX as base, so uses vector-matrix multiplication notation

  38. FX Composer • Walkthrough Example

  39. GPGPU • General Purpose GPU Processing • Key Notes • Goal to exploit fragment processor • Each pixel represents a compacted 4-component element of data • Most optimal in gathering algorithms • Vertex shader needed to re-order output • Possibly Optimal in Unified Shading Architecture