180 likes | 321 Views
This document delves into various GPU technologies and developments as of February 2005. It covers topics such as conditional execution in shaders, the performance of the 3DLabs Realizm 100 AGP, and advances in PCI-Express for enhanced data transfer. The exploration of GPU ray tracing techniques, including the studies on early Z-buffering and computational masks, highlights the evolution of rendering methods. The discussion also addresses the challenges and potential improvements in ray tracing technology, focusing on shader enhancements and memory bandwidth considerations.
E N D
Some Things Jeremy Sugerman 22 February 2005
Topics • Quick GPU Topics • Conditional Execution • GPU Ray Tracing Jeremy Sugerman, FLASHG 22 February 2005
PCI-Express • PCI-Express solves data transfer problems… Jeremy Sugerman, FLASHG 22 February 2005
3DLabs Realizm 100 AGP • Mediocre Fill Rate (About half a 9800XT) • Reasonable Texture Bandwidth • Variable Cost Instructions • 6 GFLOPS ADD – 0.5 GFLOPS LG2 • Remarkable Readback • But, No GL_TEXTURE_RECTANGLE_EXT Jeremy Sugerman, FLASHG 22 February 2005
Conditional Execution • Depth and Stencil are classic tools • Only effective early • All shaders support predication and KIL • No savings in execution time • KIL does gruesome things to the pipeline • Pixel Shader 3.0 has true branching • If-Then-Else, Data dependent loops • NV4x currently, no ATI until R500 Jeremy Sugerman, FLASHG 22 February 2005
Clear Z to 1.0 Draw Depth-Only at Z = 0.3 KIL where computation will happen Draw Color at Z = 0.7 Very Effective When it Works Fragile, Easily Disabled Stays Disabled Until glClear! Compute Mask – Z Buffer Jeremy Sugerman, FLASHG 22 February 2005
Compute Mask - EarlyZ NV41 X800 Random 2x2 Blocks 3x3 Blocks 4x4 Blocks Wavefront Jeremy Sugerman, FLASHG 22 February 2005
Compute Mask – PS3.0 • Rasterize Normally a shader like: If (pixel is live) { … MOV result.color, <output> } else { MOV result.color, <placeholder> // Or KIL } • Easy to Write • Must shade all fragments • Must write a value or KIL for all fragments Jeremy Sugerman, FLASHG 22 February 2005
Compute Mask – PS 3.0 Random 64x64 Blocks 32x32 Blocks 16x16 Blocks Wavefront Jeremy Sugerman, FLASHG 22 February 2005
Pixel Shader 3.0 • Not (yet?) a replacement for Early-Z • What about loops? • What about state machines? If (fragment is in state a) { // Computation 1 } else { // Computation 2 } • Will execution time be MAX(a, b) or a + b? Jeremy Sugerman, FLASHG 22 February 2005
GPU Ray Tracing • Tim Purcell left us a Brook raycaster • Tim (Foley) et al. beat on it for DARPA Line-of-Sight • Early-Z, 2D Addressing • Tim and I have forked it again • Explore new hardware features • Explore new algorithm options • Mature, maintainable source base Jeremy Sugerman, FLASHG 22 February 2005
Demo • Break for demo… Jeremy Sugerman, FLASHG 22 February 2005
GPU Ray Tracing – Brute Force • Initialize Scene Parameters, Geometry (CPU) • Generate Eye Rays • Foreach( triangle in the scene ) • Intersect with all rays • Record if it hits closer than any prior triangle • Shade Hits • Ray-Triangle kernel is 39 instructions • Over 100 million intersections per second Jeremy Sugerman, FLASHG 22 February 2005
GPU Ray Tracing – Uniform Grid • Initialize Scene Parameters, Geometry (CPU) • Generate Eye Rays • While (Any Rays Are Live) • Traverse the traversing rays • Intersect the intersecting rays • Shade Hits • Equivalent to ~14 million ray-triangles per second on our scenes. Jeremy Sugerman, FLASHG 22 February 2005
“Any Live Rays?” • Fundamentally a reduction • Sum across all rays • Readback to CPU • Many passes to do a GPU reduction • Could try occlusion query • Kernel that just KIL’s on dead rays • Still an extra pass • GPU global counter registers would be cool • Equivalent to 24 million ray-triangles per second when skipped. Jeremy Sugerman, FLASHG 22 February 2005
Ping Ponging Buffers • No read-modify-write causes copies: intersectTriangle(in ray, in oldHit, in tri, out hit) { if (ray hits tri closer than oldHit) { hit = <where ray hits tri>; } else { hit = oldHit; No RMW } • Memory and Bandwidth Hungry • Add conditionals / predication to kernels • Complicates Early-Z compute masking Jeremy Sugerman, FLASHG 22 February 2005
Render to Texture • DirectX has it, OpenGL does not • DirectX raytracer bluescreens NV4x drivers • Every shader draws its results to a pbuffer • Copied back to a texture each time • Superbuffers offered a fix • ATI supported them (broken now) • ARB killed them • Framebuffer Objects made it through the ARB • Only drivers are preliminary NV4x drivers Jeremy Sugerman, FLASHG 22 February 2005
GPU Ray Tracer Enhancements • 2D Addressing (duh) • kD-Tree Accelerator • Early-Z and/or PS3.0 for the Accelerators • Tuning Traverse vs. Intersect vs. Shade • Occlusion Queries / Fast Reductions • Shadows • Tuning Bandwidth • Shading… Jeremy Sugerman, FLASHG 22 February 2005