1 / 44

Ray Tracing on Programmable GPUs

Ray Tracing on Programmable GPUs. Application. Command. Geometry. Rasterization. Texture. Fragment. Display. Graphics Pipeline. Fragment Input. Textures. Fragment Program. Registers. Fragment Output. Traditional Pipeline. Programmable Fragment Pipeline.

ion
Download Presentation

Ray Tracing on Programmable GPUs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ray Tracing on Programmable GPUs

  2. Application Command Geometry Rasterization Texture Fragment Display Graphics Pipeline Fragment Input Textures Fragment Program Registers Fragment Output Traditional Pipeline Programmable Fragment Pipeline

  3. Fragment Processing Features • Rich instruction set • No branching yet (see PS 3.0 spec) • Floating point • Arithmetic • Texture memory • Dependent texturing • Multipass rendering flow control • NV_OCCLUSION_QUERY

  4. The Ray Engine [Carr02]

  5. Ray Engine – Main Idea • Ray-traingle intersection done by GPU • CPU-based renderer does everything else

  6. Ray Engine Algorithm • Renderer sends ray textures to GPU • Ray origin and direction • Renderer sends ‘triangles’ down pipeline • Vertex interpolants of a screen aligned quad • GPU performs ray-triangle intersection tests • Short fragment program • Framebuffer stores closest hit point • Renderer reads back closest hit

  7. Pixel Shader 1.4 Implementation Fixed Point Precision Problems

  8. Full Precision Simulations

  9. Ray Engine Results • Radeon 8500 fixed point implementation • 114 M ray-triangle intersections / s • Full precision simulator • 115K – 200K rays / s

  10. Ray Engine Summary • GPU performs ray-triangle intersection • CPU-based renderer does everything else • Raw ray-triangle intersection rate is faster than CPU based approach • Total rays processed per second is slower than CPU • Readback limited

  11. Streaming Ray Tracer [Purcell02]

  12. Streaming Ray Tracer – Main Ideas • Entire ray tracing computation can be done efficiently on the GPU • Minimal host interaction • Stream processor abstraction for programmable fragment processor

  13. Streaming Ray Tracer Generate Eye Rays Camera Traverse Acceleration Structure Grid Intersect Triangles Triangles Shade Hits and Generate Shading Rays Materials

  14. GPU Abstraction • Texture memory is memory • Think of dependent texture fetches as pointer dereferencing • Programmable fragment processor is a programmable stream processor • Think of multipass rendering as stream and kernel programming

  15. Texture Memory Organization Uniform Grid 3D Luminance Texture vox0 vox1 vox2 vox3 vox4 vox5 voxM 0 3 11 38 … 564 Triangle List 1D Luminance Texture vox0 vox2 0 3 1 3 7 21 216 … tri0 tri1 tri2 tri3 tri4 tri5 triN Triangles 3x 1D RGB Textures xyz xyz xyz xyz xyz xyz … xyz v0 v1 xyz xyz xyz xyz xyz xyz … xyz xyz xyz xyz xyz xyz xyz … xyz v2

  16. input record stream kernel globals kernel globals output record stream Stream Programming Model Programmable fragment processor is essentially a stream processor • Kernels and streams • Stream is a set of data records • Kernels operate on records • Streams connect kernels together • Kernels can read global memory

  17. Streaming Flow Control Application and Geometry Stages Rasterization Fragments (Input Stream) Fragment Program (Kernel) Texture (Globals) Fragment Program Output (Output Stream)

  18. Multiple Rendering Passes Pass 1 Generate Eye Rays Draw quad Rasterize

  19. Multiple Rendering Passes Pass 1 Generate Eye Rays Run fragment program

  20. Multiple Rendering Passes Pass 1 Generate Eye Rays Save to offscreen buffer (rays)

  21. Multiple Rendering Passes Pass 2 Traverse Draw quad Rasterize

  22. Multiple Rendering Passes Pass 2 Traverse Run fragment program Restore (rays)

  23. Multiple Rendering Passes Pass 2 Traverse Save to offscreen buffer (ray voxel pr)

  24. Demos Rendered using a Radeon 9700 Pro

  25. Demos Rendered using a Radeon 9700 Pro

  26. Demos Rendered using a Radeon 9700 Pro

  27. Demos Rendered using a Radeon 9700 Pro

  28. Streaming Ray Tracer Results • Simulations • 50M – 200M ray-triangle intersections/s • Radeon 9700 Pro Implementation • 100M ray-triangle intersections/s • 300K – 4.0M rays/s

  29. Streaming Ray Tracer Summary • Entire ray tracing computation can be mapped efficiently to the GPU • Stream processor is a good abstraction for a programmable fragment processor

  30. Dedicated Hardware Ray Tracing

  31. Ray Tracing in Hardware • Volume Rendering • [Meissner98],[Pfister99] • Offline Rendering • [ART01],[ART02] • Interactive Rendering • [Schmittler02]

  32. SaarCOR – Main Idea • Scalable and efficient real time hardware ray tracer • Implementation based on Saarland RTRT

  33. SaarCOR Implementation • Packet based ray tracer • Several custom cores • Computational units • Traversal, intersection, ray generation and shading • Memory units • Memory controller, caches, routers • Multithreaded • Standard DRAM memory on board • Virtual memory support for large scenes • Support for programmable shading

  34. SaarCOR Architecture

  35. SaarCOR Test Scenes

  36. Simulated Performance 137 fps 59 fps Standard 4-pipeline SaarCOR 100M – 400M rays/s 44 fps 170 fps

  37. Simulated Bandwidth Usage No VMA With VMA PCI 1.9MB 2.5MB 0.03MB 26.6MB 34.1MB 0.91MB 2.1MB 2.6MB 0.02MB 6.1MB 7.7MB 0.14MB

  38. SaarCOR Summary • Scalable and efficient • Requires fewer FP units than GeForce3 • Low bandwidth requirements • Hides latency through multithreading • Fast frame rates

  39. Conclusions for Part I

  40. Conclusions • Real time ray tracing advantages • Physically correct renderings • High geometric complexity • Shading flexibility • Several options for real time ray tracing • Software, GPU, Hardware

  41. Backup

  42. Acknowledgments • Ian Buck, Bill Mark, Pat Hanrahan • James ‘RTD’ Percy, Pradeep Sen, Eric Chan • Matt Papakipos, Kurt Akeley - NVIDIA • Bob Drebin, Mark Peercy – ATI • Sponsors • ATI, MERL, NVIDIA, Sony, Sun • DARPA

  43. Ray-Triangle Intersection as a Crossbar

  44. Rasterization as a Crossbar

More Related