1 / 33

Programmable Graphics Hardware CS 446: Real-Time Rendering & Game Technology

Programmable Graphics Hardware CS 446: Real-Time Rendering & Game Technology. David Luebke University of Virginia. Recap: Advanced Texturing. Billboards Screen-aligned, world-aligned Point sprites Imposters Trees, buildings, portal textures, billboard clouds

winola
Download Presentation

Programmable Graphics Hardware CS 446: Real-Time Rendering & Game Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programmable Graphics HardwareCS 446: Real-Time Rendering & Game Technology David Luebke University of Virginia

  2. Recap: Advanced Texturing • Billboards • Screen-aligned, world-aligned • Point sprites • Imposters • Trees, buildings, portal textures, billboard clouds • Dynamic imposters for “caching” rendering results • Depth textures • Multitexturing • Low-res light maps, hi-res decals, etc Real-Time Rendering

  3. Textures: Other Important Stuff • Render to texture – framebuffer objects (FBOs) • Multiple render targets • Environment maps • Sphere map, cube maps (hardware supported) • Shadow maps • A depth texture rendered from light source (more later) • Relief textures • Demo now, details later Real-Time Rendering

  4. Textures: Still More Stuff • Normal maps – especially for bump mapping • Gloss maps, reflectance maps, etc • Generally: • Think of textures as global memory for fragment programs, with built-in filtering • Just starting to be able to access textures in vertex programs too (NVIDIA hardware only, today) • Deferred shading • Projective texture mapping Real-Time Rendering

  5. Next topic: Cg • Many of the techniques we discuss in this class do not depend on programmable graphics hardware • But even those are often easier to implement! • And programmable graphics opens up an endless number of tricks and techniques that could not have been efficiently implemented before • So, the next topic is a brief intro to Cg • My apologies to those of you who’ve seen this • My apologies to those of you who haven’t Real-Time Rendering

  6. Acknowledgement & Aside • Much of this lecture comes from Bill Mark’s SIGGRAPH 2002 course talk on NVIDIA’s programmable graphics technology • For this reason, and because the lab is outfitted with NVIDIA cards, we will focus on NVIDIA tech • I try to mention similarities and differences with ATI, the other main GPU vendor, in lecture and slides • Note: many/most images are from NVIDIA as well Real-Time Rendering

  7. The Graphics Pipeline • A simplified graphics pipeline • Note that pipe widths vary • Many caches, FIFOs, and so on not shown Graphics State CPU GPU Xformed, Lit Vertices (2D) Screenspace triangles (2D) Fragments (pre-pixels) Final Pixels (Color, Depth) Application Transform& Light AssemblePrimitives Rasterize Shade Vertices (3D) VideoMemory(Textures) Render-to-texture Real-Time Rendering

  8. GPU Pipeline: Transform • Transform & light (a.k.a. vertex processor) • Transform from “world space” to “image space” • Compute per-vertex lighting Courtesy Mark Harris Real-Time Rendering

  9. GPU Pipeline: Rasterize • Rasterizer • Convert geometric rep. (vertex) to image rep. (fragment) • Fragment = image fragment • Pixel + associated data: color, depth, stencil, etc. • Interpolate per-vertex quantities across pixels Courtesy Mark Harris Real-Time Rendering

  10. GPU Pipeline: Shade • Fragment processors (multiple in parallel) • Compute a color for each pixel • Optionally read colors from textures (images) Courtesy Mark Harris

  11. Programmable vertex processor! The ModernGraphics Pipeline Graphics State CPU GPU VertexProcessor FragmentProcessor Xformed, Lit Vertices (2D) Screenspace triangles (2D) Fragments (pre-pixels) Final Pixels (Color, Depth) Application Transform& Light AssemblePrimitives Rasterize Shade Vertices (3D) VideoMemory(Textures) Render-to-texture • Programmable pixel processor! Real-Time Rendering

  12. Programmable primitive assembly! The Coming SoonGraphics Pipeline Graphics State CPU GPU GeometryProcessor Xformed, Lit Vertices (2D) Screenspace triangles (2D) Fragments (pre-pixels) Final Pixels (Color, Depth) Application VertexProcessor AssemblePrimitives Rasterize FragmentProcessor Vertices (3D) VideoMemory(Textures) Render-to-texture • More flexible memory access! Real-Time Rendering

  13. Precision • 32-bit IEEE floating-point throughout pipeline • Framebuffer • Textures • Fragment processor • Vertex processor • Interpolants Real-Time Rendering

  14. Multiple data types in hardware • Can support 32-bit IEEE floating point throughout pipeline • Vertices, interpolants, framebuffer, textures, computations • Fragment processor also supports: • 16-bit “half” floating point, 12-bit fixed point • These may be faster than 32-bit • Framebuffer/textures also support: • Large variety of fixed-point formats • E.g., classical 8-bit per component RGBA, BGRA, etc. • These formats use less memory bandwidth than FP32 Real-Time Rendering

  15. Vertex processor capabilities • 4-vector FP32 operations • Condition codes + true data-dependent control flow • Conditional branches, subroutine calls, jump table • Useful for avoiding extra work, e.g.: • Don’t do animation, skinning if vertex will be clipped • Do displacement mapping only for vertices near silhouette • Transcendental arithmetic instructions (e.g. COS) • User clip-plane support • Texture reads (up to 4 textures, unlimited lookups) Real-Time Rendering

  16. Vertex processor limitations • No arbitrary memory write • No “vertex kill” • Can put vertex off-screen • Can make degenerate primitives • Only 32-bit texture formats supported Real-Time Rendering

  17. NV40-G70 vertex processor resources • 65535 instructions per program • Other statistics (NV30, not sure about NV40-G70): • 16 temporary 4-vector registers • 256 “uniform” parameter registers • 2 address registers (4-vector) • 6 clip-distance outputs Real-Time Rendering

  18. Fragment processor: texture mapping • Texture reads are just another instruction • Allows computed texture coordinates, nested to arbitrary depth • This is a big difference w/ NVIDIA and ATI right now • Allows multiple uses of a single texture unit • Optional LOD control – can specify filter extent • Think of it as a memory-read instruction, with optional user-controlled filtering Real-Time Rendering

  19. Fragment processor capabilities • Dynamic branching • Conditional fragment-kill instruction • Read access to window-space position • Read/write access to fragment Z (but not stencil) • Multiple render targets • Built-in derivative instructions • Partial derivatives w.r.t. screen-space x or y • Useful for anti-aliasing shaders • FP32, FP16, and fixed-point data Real-Time Rendering

  20. Fragment processor limitations • Dynamic branching less efficient than vertex proc. • Especially for non-coherent branching (<~ 30x30 pixels) • Can do a lot with condition codes • No indexed reads from registers • I.e., no indexed arrays • Must use texture reads instead • No arbitrary memory write Real-Time Rendering

  21. Fragment processor resources • 65535+ instructions • Nearly unlimited constants • Each constant counts as one instruction • 16 texture units (NV30, still?), reuse as often as desired • 10 FP32 x 4 perspective-correct inputs (e.g. tex coords) • Up to 4 128-bit framebuffer “color” outputs • Can pack as 4 x FP32, 8 x FP16, etc…) • Can also set the depth output • 24 or 32 bits, depending on stencil • Changing depth in fragment program may disable Z-optimizations Real-Time Rendering

  22. GPU vendor differences • Note: this slide will be dated almost instantly • NVIDIA: as described in previous slides • ATI hardware today (1900XT current high-end part): • No vertex texture fetch (but good render-to-vertex-array) • Far fewer levels of computed texture coordinates • Better at fine-grained (less coherent) dynamic branching • ATI Xenos (Xbox 360 chip): • Unified shader model: vertex proc == pixel proc • Scatter support: shaders can write arbitrary memory loc Real-Time Rendering

  23. Cg – “C for Graphics” • Cg is a high-level GPU programming language • Designed by NVIDIA and Microsoft • Competes with the (quite similar) GL Shading Language, a.k.a GLslang Real-Time Rendering

  24. Programming in assembly is painful Assembly Cg …FRC R2.y, C11.w; ADD R3.x, C11.w, -R2.y; MOV H4.y, R2.y; ADD H4.x, -H4.y, C4.w; MUL R3.xy, R3.xyww, C11.xyww; ADD R3.xy, R3.xyww, C11.z; TEX H5, R3, TEX2, 2D; ADD R3.x, R3.x, C11.x; TEX H6, R3, TEX2, 2D;… … L2weight = timeval – floor(timeval); L1weight = 1.0 – L2weight; ocoord1 = floor(timeval)/64.0 + 1.0/128.0; ocoord2 = ocoord1 + 1.0/64.0; L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0)); L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0)); … • Easier to read and modify • Cross-platform • Combine pieces • etc. Real-Time Rendering

  25. Some points in the design space • CPU languages • C – close to the hardware; general purpose • C++, Java, lisp – require memory management • RenderMan – specialized for shading • Real-time shading languages • Stanford shading language • Creative Labs shading language Real-Time Rendering

  26. Design strategy • Start with C (and a bit of C++) • Minimizes number of decisions • Gives you known mistakes instead of unknown ones • Allow subsetting of the language • Add features desired for GPU’s • To support GPU programming model • To enable high performance • Tweak to make it fit together well Real-Time Rendering

  27. How are GPUs different from CPUs? • GPU is a stream processor • Multiple programmable processing units • Connected by data flows VertexProcessor FragmentProcessor FramebufferOperations Assembly &Rasterization Application Framebuffer Textures

  28. Cg separates vertex & fragment programs VertexProcessor FragmentProcessor FramebufferOperations Assembly &Rasterization Application Framebuffer Textures Program Program Real-Time Rendering

  29. Cg programs have two kinds of inputs • Varying inputs (streaming data) • e.g. normal vector – comes with each vertex • This is the default kind of input • Uniform inputs (a.k.a. graphics state) • e.g. modelview matrix • Note: Outputs are always varying vout MyVertexProgram( float4 normal,uniform float4x4 modelview) { …

  30. Binding VP outputs to FP inputs • Let compiler do it • Define a single structure • Use it for vertex-program output • Use it for fragment-program input struct vout { float4 color; float4 texcoord; … };

  31. Binding VP outputs to FP inputs • Do it yourself • Specify register bindings for VP outputs • Specify register bindings for FP inputs • May introduce HW dependence • Necessary for mixing Cg with assembly struct vout { float4 color: TEX3; float4 texcoord: TEX5; … };

  32. Some inputs and outputs are special • E.g. the position output from vert prog • This output drives the rasterizer • It must be marked struct vout { float4 color; float4 texcoord; float4 position : HPOS; };

More Related