GPU-Assisted Path Tracing - PowerPoint PPT Presentation

gpu assisted path tracing n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
GPU-Assisted Path Tracing PowerPoint Presentation
Download Presentation
GPU-Assisted Path Tracing

play fullscreen
1 / 27
GPU-Assisted Path Tracing
139 Views
Download Presentation
daniel-gentry
Download Presentation

GPU-Assisted Path Tracing

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. GPU-Assisted Path Tracing Matthias Boindl Christian Machacek Institute of Computer Graphics and Algorithms Vienna University of Technology

  2. Motivation: Why Path Tracing? • Physically based • Nature provides the reference image • Parallelizable • Sublinear in #objects • Conceptually simple • Can lead to a clean implementation • But: fast implementation on GPUs not trivial

  3. Outline • Path tracing intro • Main steps of the algorithm • Mapping the algorithm to the GPU • How to organize code into kernels • When to launch kernels • How to pass data between kernels • Accelerationstructures • Focus on bounding volume hierarchies Christian Machacek

  4. Path Tracing Intro • Like ray tracing, except it… • …supports arbitrary BRDFs • …is stochastic: at each bounce, the new direction is decided randomly • Convergence video From Pharr, Humphreys: PBRT, 2nd ed. (2010)

  5. Path Tracing Pseudocode while image not converged r = new ray from eye through next pixel do i = closest intersection of r with scene if no i: break if i is on a light source: c = c + throughput * emission randomly pick new direction and create reflected ray r evaluate BRDF at i update throughput while path throughput high enough From Pharr, Humphreys: PBRT, 2nd ed. (2010)

  6. Path Tracing Pseudocode while image not converged r = new ray from eye through next pixel do i = closest intersection of r with scene if no i: break if i is on a light source: c = c + throughput * emission randomly pick new direction and create reflected ray r evaluate BRDF at i update throughput while path throughput high enough From Pharr, Humphreys: PBRT, 2nd ed. (2010)

  7. Megakernel Execution Divergence From Bikker (2013)

  8. Solution: Wavefront Path Tracing • Separate, specialized kernels • Keep a pool of ~1 million paths alive • Work for next stage goes into kernel-specific, compact queues (=4MB index arrays) https://mediatech.aalto.fi/~samuli/

  9. Results • Performance • Execution times • (ms / 1M path segments) Christian Machacek

  10. Limitations and Possible Improvements • Higher memory requirements (+200 MB) • Kernel launch overhead • Dynamic parallelism on GK110 • Use an outer scheduling kernel • No CPU round trip • Launch independent stages side-by-side • CUDA streams • So kernels with little work don’t hog the GPU Christian Machacek

  11. Acceleration Structures • Find nearestintersection in O(log N) • Space partitioning vs. objectpartitioning • Hybrid methodsexist Matthias Boindl

  12. Performance • For interactive rendering, compromise • Traversal performance (build quality) • Construction/Update time • Update or rebuild from scratch • Adapt to GPU environment • Memory architecture • Parallel execution Matthias Boindl

  13. State of the Art • TeroKarras and Timo Aila. 2013. Fast parallel constructionofhigh-qualityboundingvolumehierarchies. In Proceedingsofthe 5th High-Performance Graphics Conference (HPG '13). ACM, New York, NY, USA, 89-99. Matthias Boindl

  14. Close the Performance Gap Matthias Boindl

  15. Basic Idea • Fast construction of simple BVH • Generate leaf for each triangle • Reduce SAH cost by modifying tree Matthias Boindl

  16. Treelets • Allow local tree modification ABCF areleaves, DEG areinternalnodes Matthias Boindl

  17. Treelet Construction • Find root: parallel bottom-up traversal • Start withleaves • Useatomiccounteratconjunctions • Ensures all childrenhavebeenprocessed • Buildtreelet • Add bothchildren • Pick childrenwithhighestsurfacearea • Fixed size: 7 leafnodes Matthias Boindl

  18. Rearrange Treelet • Minimizetreeletrootnodesurfacearea • Naive implementation: testeachpermutation • Better: dynamicprogramming • Caching ofbest intermediate resultsStart withleaves, thenpairs, thentriplets, … • Suboptimal subtreeconstructionavoided • Parallelizableas well Matthias Boindl

  19. Results • Gap closed Matthias Boindl

  20. Results • Speed/Quality tradeoff Matthias Boindl

  21. Conclusion • Use specialized kernels • Lower execution divergence • (Better use of instruction cache) • (Fewer registers used simultaneously) • Constructaccelerationstructuresquickly • But not tooquickly Matthias Boindl

  22. Thanks for your attention! Institute of Computer Graphics and Algorithms Vienna University of Technology

  23. Results • Speed/Quality tradeoff Matthias Boindl

  24. Logic Kernel • Does not need a queue, operates on all paths • If shadow ray was unblocked, add light contribution • Find material or light source the ray hits • Place path into proper material queue • Russian roulette • If path terminated, accumulate to image • Place path into new path queue • Sample light sources (aka next event estim.) Christian Machacek

  25. New Path Kernel • Generate a new image-space sample • Generate camera ray • Place it into extension ray cast queue • Initialize path state • Throughput • Pixel position • etc. Christian Machacek

  26. Material Kernels • Generate incoming direction • Evaluate light contribution based on light sample generated in the logic kernel • We haven’t cast the shadow ray yet! • For MIS: p(light sample) from the BSDF • Discard BSDF stack • Queue • extension ray • (shadow ray) Christian Machacek

  27. Ray Cast Kernels • Extension rays • Find first intersection against scene geometry • Store hit data into path state • Shadow rays • Blocked or not? Christian Machacek