afrigraph 2003 course on advanced interactive ray tracing and interactive global illumination n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination PowerPoint Presentation
Download Presentation
Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Loading in 2 Seconds...

play fullscreen
1 / 100

Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination - PowerPoint PPT Presentation


  • 157 Views
  • Uploaded on

Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination. Ingo Wald Carsten Benthin Philipp Slusallek Saarland University . Ray-Generation. First: What is Ray Tracing ?. Ray-Traversal. Intersection. Shading. Framebuffer. Agenda.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Afrigraph 2003 Course onAdvanced Interactive Ray TracingandInteractive Global Illumination Ingo Wald Carsten Benthin Philipp Slusallek Saarland University

    2. Ray-Generation First: What is Ray Tracing ? Ray-Traversal Intersection Shading Framebuffer

    3. Agenda • Introduction & Motivation • Why Interactive Ray Tracing at all ? • Part I – Interactive Ray Tracing Architectures • Software Ray Tracing • Ray Tracing on Programmable GPUs • Dedicated Ray Tracing Hardware • Part II – Advanced Ray Tracing Issues • Handling Dynamic Scenes • The OpenRT Interactive Ray Tracing API • Part III – New Applications • Industrial Application: Interactive Visualization of Car Headlights • Interactive Global Illumination • Summary and Conclusions Afrigraph 2003

    4. Why Interactive Ray Tracing ?

    5. We have NVidia – so what do we need Ray Tracing for ? • Because it is high quality… • Fully Programmable and Arbitrary Shading Operations • All operations performed in floating point • Flexibility: Can shoot arbitrary Rays • Shadows, reflections, refractions, … • Even suitable for global illumination • Simple Programming Model • No need for multiple passes or OpenGL ‘tricks’ • For indirect effect (like shadows): just shoot a ray ! • Automatic ‘correctness’ • No need for approximations (like reflection maps)  Ray Tracing is much more flexible and powerful rendering algorithm than ‘classical’ triangle rasterization Afrigraph 2003

    6. We have NVidia – so what do we need Ray Tracing for ? • But not only that : It’s also efficient ! • Logarithmic scene complexity • Useful for increasingly complex scenes (“1 mtri, no problem !” …) • No multiple rendering passes • ‘Automatic’ Visibility Culling & Occlusion Culling • Hidden geometry not even touched … • Depth complexity not an issue • No overdraw, shading performed exactly once per ray • Very useful for increasingly costly shading • Small bandwidth requirements (if you do it right…) • Memory access coherence + culling + single shading + … Afrigraph 2003

    7. We have NVidia – so what do we need Ray Tracing for ? To summarize: • … it’s highly flexible • … it’s high-quality • … it’s efficient • And: All of that combines automatically • Can do some of that sometimes in HW, but usually not all together Afrigraph 2003

    8. “If its so good, then why isn’t it real ?” • 1.) Better asymptotic complexity, but huge constants • 1 ray ~ 1000 CPU-cycles • Runs on hardware that it doesn’t really fit to… • Uses only tiny fraction of today’s CPUs, no parallelism, … • Need many rays/sec for full interactivity • ~ 1Mpix/frame * 4-fold anitaliasing *25 frames/sec * 10 rays/pixel  One billion rays per second … • 2.) Graphics users don’t have the choice • Rasterization has highly sophisticated HW implementations  HW technology for rasterization 10 years ahead of RT HW… • There is no interactive ray tracing chip (yet), no matter the cost… • All applications are designed for OpenGL  There is no market for interactive ray tracing (really ?) • Still more money/time/effort spent on improving rasterization Afrigraph 2003

    9. Why is there no Ray Tracing Hardware ? Because Graphics hardware evolved 20 years ago ! • And: Rasterization was the better choice back then… • Small scenes  (asymptotic) complexity doesn’t matter for small N • Large triangles • Coherence: incremental ops & interpolation, low bandwidth • Simple (integer-)operations, highly pipelined • FPU-requirements of ray tracing unthinkable 10 years ago… • No fragment ops except interpolation • Programmability not an issue  Very deep pipelines: no dependencies, no branches, no nothing, … • Can be built in HW very efficient, very fast, very cheap • Note: All of this is changing today ! • Eg today, GForce 3 already has more FPU power than any CPU… Afrigraph 2003

    10. Todays State of the Art in Realtime Ray Tracing Software Implementations are slowly becoming available • Michael Muuss, Army Research Labs • Huge Cluster of SGI machines… • Parker et al, University of Utah • 32-128 CPU SGI Origin • Saarland University • 4 dual PIII’s in 2000, up to 24 dual Athlon 1800+ today Hardware Architectures are already beeing designed • SaarCOR (Schmittler et al., HWWS 2002) • Ray Tracing on Programmable GPUs (Purcell, SigGraph 2002) • Hybrid Software/GPU system (Hart, HWWS 2002) • Several alternatives for future realtime ray tracing • Can’t yet decide which is best, only know: “It’ll come” Afrigraph 2003

    11. Todays State of the Art in Realtime Ray Tracing • Even today, IRT solves tasks that even high-end graphics hardware still cannot handle ! • Highly complex models (Muuss, Utah, Saarland [RW2001]) • High-quality Isosurface and Volume Visualization (Utah) • Shadows, reflections, arbitrary shading… [Saarland, Utah] • High-quality reflection simulation of car headlights [PGV2002] • Interactive Global Illumination [RW2002] Afrigraph 2003

    12. Todays State of the Art- Some Snapshots Afrigraph 2003

    13. Video

    14. Part IDifferent Approaches toRealtime Ray Tracing

    15. Different Approaches to Realtime Ray Tracing Basically three choices: • Pure Software Implementations • Today: Highly parallel • Shared Memory (Utah), or PC Clusters (Saarland) • Future: Single PC ? • Moore’s Law also holds for CPUs ! • Perhaps with streaming co-processors (e.g. “SSE++”) • Mixed SW/HW: RT on Programmable GPUs • Purcell et al., Standford • Converges to the ‘coprocessor’ approach • Pure HW • Dedicated RT hardware (Schmittler et al., SaarCOR) • Summarize all three approaches Afrigraph 2003

    16. Alternative ISoftware Ray Tracing(examplary on the Saarland engine)

    17. The OpenRT Interactive Ray Tracing Engine Features of OpenRT: • Highly efficient implementation of RT kernels • On a single Athlon MP 1800+ CPU: ~ 500.000-1.5 million rays per second for average models (100ktri – 1 Mtri) • Up to 10 million rps (rays/sec) range (no shading, simple scenes) • Sophisticated parallelization on cluster of PCs • Dynamic load-balancing • Using up to 24 dual-Athlon MP 1800+ or 25 dual P4 Xeon 2.4GHz • Dynamically loadable, fully programmable Shaders • Arbitrary c-code shading, arbitrary rays • Renderman-like Shading Language • Can handle dynamic scenes (later) • OpenGL-like API (later) Afrigraph 2003

    18. Where does the speed come from ? Speed depends on several factors… • Using fastest available hardware • Fast CPUs, and many CPUs • Good algorithms – Avoid operations in the first place • Fast Intersection and Traversal (kd-trees) • Minimize Intersections and Trv-steps with high-quality BSPs • Just as important – Make sure you’re using your silicon correctly ! • Highly efficient implementation • Machine-dependent code, if necessary (SSE) Afrigraph 2003

    19. Where does the speed come from ? Keep the Computational Units busy ! • Make CPU doesn’t stall • Avoiding pipeline stalls has top priority • Look at memory, caches and bandwidth !!! • Example: Cache miss during triangle intersection costs about 4 times as much as the computations themselves !!! • Packing, aligning, cache-friendly data layout, prefetching, … • But: no details here • Already covered that at Afrigraph 2001 • It’s not one single method, its more a principle Afrigraph 2003

    20. Distributed Ray Tracing • One CPU still not fast enough • 1 Mray/sec is fast, but not enough • Need more CPUs  Cluster’s are cheap ($20k-$50k) • Many approaches: • Static vs dynamic load balancing • Object-space vs image-space vs ray-based task partitioning, … • Pixel-interleaved (load balancing) vs tiles (coherence) • … • Problem: Interactivity constraint • Have to finish whole frame in 1/10th of a second • Few time for sophisticated reordering/scheduling Afrigraph 2003

    21. Distributed Ray Tracing Our approach (mostly Carsten Benthin) • Image-based task partitioning  Break image up into ‘tiles’ (usually 16x16 or 32x32) • Since API: Can dynamically change task partitioning scheme • Strongly varying workload  Need dynamic load balancing: Let clients ask for work … • Have to care about network-latencies • (10ms Network-latency = 10.000 rays !) • Highly efficient networking/communication code • Double-buffering, prefetching, packing, streaming, asynchronous sending and rendering, interleaving of different tasks, multithreading, … Afrigraph 2003

    22. Distributed Ray TracingResults • Can efficiently use many CPUs • 32x32 tiles at 640x480 = 150 tiles  enough for many CPUs • Usually limiting factor: Pixels/second (not rays/sec) • Bandwidth limited at server: 640x480 at 10-15 frames/sec • For < 10 fps: Usually achieve 90-99% client utilization • Client bandwidth usually not an issue … (100Mbit) • Rendering Complexity helps ! • More costly tiles = better compute/BW ratio, less Pixels/sec • Can use more CPUs without hitting bandwidth limit • Doubling rays/pixel easier than doubling framerate • Framerate scales linearly only up to max framerate • But always scales linearly in rays/pixel • Better networking hardware would definitely help Afrigraph 2003

    23. Realtime Ray TracingApproach IIRay Tracing on Programmable GPUs

    24. Ray Tracing on Programmable GPUs Graphics Hardware today • GPUs are extremely powerful • Already more transistors than P4 • Full IEEE floating point ! • Many, many, many parallel FPU’s • Moore’s Law: Faster growth than for CPUs • GPUs become more and more programmable • First: ‘Register Combiners’ • Then: ‘Vertex Shaders’ • Programmable per vertex • linear interpolation inside the vertices • Today: ‘Pixel Shaders’, ‘Fragment Programs’ • Fully programmable for each fragment Afrigraph 2003

    25. Ray Tracing on Programmable GPUs GPU programmability today: • Full IEEE • SIMD computations • Access to ‘memory’ (textures) in every instruction • Multiple indirections (pointer chasing) now possible • “dependent texture reads” • Still: Several restrictions • Conditionals, loops, recursion, dependent texture writes … • Typically programmed in ‘GPU-assembler’ • Most recent: High-level ‘meta’ languages • E.g. ‘CG’ (‘C’ for GPUs) Afrigraph 2003

    26. Streaming Computations on Programmable GPUs Idea: Use GPU as streaming co-processor • Don’t use it for rasterizing at all… • Pixels form a ‘stream’ of elements • Apply small program (‘kernel’) for whole stream • Render screen-aligned quad with a fragment shader • Fragment program executed for each screen pixel • Each pixel operates on different data • Read data from textures • Screen-aligned textures : 1 texel for each pixel • Output to framebuffer : 1 ‘pixel’ for each fragment program • Feedback Loop: Copy framebuffer to textures • Future: Directly write into textures Afrigraph 2003

    27. Ray Tracing on Programmable GPUs Screen aligned Quad Memory (Textures) Fragment Kernel (Fragment Shader) Data (Texels) Output Frame Buffer

    28. Ray Tracing on Programmable GPUs Screen aligned Quad Memory (Textures) Fragment Kernel (Fragment Shader) Data (Texels) Output Frame Buffer

    29. Ray Tracing on Programmable GPUs Screen aligned Quad Memory (Textures) Fragment Kernel (Fragment Shader) Data (Texels) Output Frame Buffer

    30. Ray Tracing on Programmable GPUs Screen aligned Quad Memory (Textures) Fragment Kernel (Fragment Shader) Data (Texels) Output Frame Buffer

    31. Ray Tracing on Programmable GPUs Screen aligned Quad Memory (Textures) Fragment Kernel (Fragment Shader) Data (Texels) Output Frame Buffer

    32. Ray Tracing on Programmable GPUs Screen aligned Quad Memory (Textures) Fragment Kernel (Fragment Shader) Data (Texels) Output Frame Buffer

    33. Ray Tracing on Programmable GPUs Screen aligned Quad Memory (Textures) Fragment Kernel (Fragment Shader) Data (Texels) Output Frame Buffer

    34. Ray Tracing on Programmable GPUs Screen aligned Quad Memory (Textures) Fragment Kernel (Fragment Shader) Data (Texels) Output Frame Buffer

    35. Ray Tracing on Programmable GPUs Screen aligned Quad Memory (Textures) Fragment Kernel (Fragment Shader) Data (Texels) Output Frame Buffer

    36. Ray Tracing on Programmable GPUs Screen aligned Quad Memory (Textures) Fragment Kernel (Fragment Shader) Data (Texels) Output Frame Buffer Feedback !

    37. Ray Tracing on Programmable GPUs Mapping Ray Tracing to the GPU • Use textures for the storing ‘variables’ • Ray: ‘origin’ and ‘direction’ 2D textures (3 floats each) • Hit: 2D texture (3 floats: u,v,id) • Vertices: 1D-texture of vertex positions (3 floats each) • Triangles: 1D-texture of vertex ids (1 float each) • Acceleration structure: e.g. 3D-texture for simple grid • Multiple indirections no problem • E.g. use triangle[i] as texture coordinate into vertex[] texture • Up to 4 indirections (grid  triangle list  triangle  vertex) Afrigraph 2003

    38. Ray Tracing on Programmable GPUs Write ‘kernels’ for different ray tracing ops • Ray Generation • Get pixel position from texture coordinates • Somehow get camera settings (e.g. from quad color, or texture) • Compute corresponding ray • Write to ‘origin’, ‘direction’, ‘state’ textures • Triangle Intersection • Read triangle ID to be intersected from state • Get triangle vertices from textures • Intersect • Update state texture • Similar for traversal, triangle list intersection, shading, … Afrigraph 2003

    39. Ray Tracing on Programmable GPUs • Have kernels for ray generation, traversal, intersection, etc. • Each ray is in exactly one ‘state’ • E.g. in ‘intersection’ state • Make sure only rays in ‘correct’ state are processed • E.g. apply intersection kernel only to rays in intersect state • Usual GL masking methods, e.g. stencil bits, early pixel kill etc.  Can generate overhead, but usually ok … • Fragment program can change state of ray • E.g. change from ‘traversal’ to ‘intersection’ in non-empty voxel • Combine different kernels by just calling them in turn • E.g. rendering an ‘intersection’ quad will do one intersection step (but only for rays in intersect state !) • Secondary rays rel. easy for ‘Shader’ kernel • Update origin&direction textures, go back to ‘traversal’ state… Afrigraph 2003

    40. Ray Tracing on Programmable GPUs Results: • Easy to exploit parallelism in the GPU • Many more pixels than fragment pipelines • Comparable performance to single CPU • Even though its only a prototype implementation • Limited by fragment pipeline very soon… • Main Limitation • Fragment processing speed • Texture memory • Need many textures for each pixel • Also need to store whole scene in texture • Bandwidth • Number of different states must be small ! Afrigraph 2003

    41. Ray Tracing on Programmable GPUs Additional limitations of current GPUs • Bandwidth problems due to missing loops • Often have to write data just to save it for next iteration • Overhead due to missing ‘write’ capability • Accuracy problems – no ints, all floats • E.g. rounding modes when reading IDs from a texture … • Problems due to missing ‘dependent writes’ • Many textures for input, but only one framebuffer for output • Need multiple passes computing more than 3 values per pix. • Each fragment shader writes to exactly one predetermined position • Hard to do recursive operations with that limitation • Kd-tree construction ? Afrigraph 2003

    42. Ray Tracing on Programmable GPUs Ray tracing on GPUs in the future ? • Many limitations will (probably) change • Loops, branches, dependent writes, int textures, texture memory, early pixel kill … • Performance will increase faster than for CPUs  Might soon be faster, and similarly flexible, as ray tracing on a CPU ! Afrigraph 2003

    43. Realtime Ray TracingApproach IIIDedicated Ray Tracing Hardware

    44. Dedicated Ray Tracing Hardware • Relatively low efficiency when using GPU for RT • Many units not needed at all (rasterization, z-buffer, clipping, lighting, …) • Lots of overhead • Programmable units can never be as efficient as dedicated HW • Dedicated ray tracing HW should be more efficient • Building RT HW is feasible today • FPU power not a problem any more (see GForce3 FPU performance) • Die size/Nr of transistors not a problem any more • Main problem: Off-chip bandwidth ! • Already between chip and cache Afrigraph 2003

    45. Dedicated Ray Tracing Hardware Bandwidth: Same problem as in SW • Approach in SW: Bandwidth reduction by Coherent Ray Tracing (packet traversal) • HW: Much larger packets (64x64 vs 2x2 !) • Much bigger bandwidth saving • Target realtime full-screen resolutions • Larger packet sizes not a problem  Lots of coherence • Avoiding overhead simple in HW • Much simpler than with SSE Afrigraph 2003

    46. SaarCOR Architecture Features • Based on interactive software ray tracer • Exactly same data structures, … • KD-trees as accelleration structure • Pakets of rays to reduce bandwidth • Fixed OpenGL-like shading… • … plus shadow and reflection rays Goals: • Simple low bandwidth memory interface • Half the floating point requirements of GeForce3 • Achieves frame rates comparable to today’s gfxcards Afrigraph 2003

    47. SaarCOR Architecture: System overview Afrigraph 2003

    48. SaarCOR Architecture: Features • Scalable • Fully pipelined • Multi threading for latency hiding • Simple communication pattern (no routing) • Highly asynchronous Afrigraph 2003

    49. SaarCOR – Current Status Simulation on register-transfer level • Core @ 533MHz, Memory 64 Bit @ 133 MHz (simple SD-RAM, no DDR!) • Each pipeline uses 36 FP-units • Standard SaarCOR: • 4 pipelines • 16 threads per pipe • 1 GB/s bandwidth to memory (!) • 272 KB for caches (!) • Four pipes ~ ½ FP-resources of GeForce 3 Afrigraph 2003

    50. Issues On-chip memory of standard SaarCOR • Caches: 272 KB • RF for rays: 288 KB • RF for stack: 535 KB Register level simulations only Simple shading only Afrigraph 2003