KD-Tree Acceleration Structures for a GPU Raytracer

KD-Tree Acceleration Structures for a GPU Raytracer Tim Foley, Jeremy Sugerman Stanford University

Motivation • Accelerated raytracing • On commodity HW • Production rendering • Real-time applications? • Performance trend • 9800 XT : 170M ray-triangle intersects/s • X800 XT PE: 350M ray-triangle intersects/s

GPU Raytracing • Promising early results • Simple scenes • Uniform grid • Problems with complex scenes • Hierarchical accelerator (kd-tree) • Improve scalability

Outline • Background • GPU Raytracing • KD-Tree Algorithm • KD-Restart, KD-Backtrack • Results • Future Work

Background • RayEngine [Carr et al. 2002] • Parallel ray-triangle intersection • Host controls culling • [Purcell et al. 2002] • Entire raytracing pipeline • Many rays required for efficiency • Uniform Grid

Why not KD-Tree? • Uniform grid acceleration structure • Regular structure = efficient traversal • Regular structure = poor partitioning • KD-Trees • Adapt to scene complexity • Compact storage, efficient traversal • “Best” for CPU raytracing [Havran 2000]

X Y Z A C B D KD-Tree tmin Z X B Y D C A tmax

Z X B X Y Y Z D C A A C B D KD-Tree Traversal

Per-Fragment Stacks • Parallel (per-ray) push • No indexed write in fragment program • Per-ray stack storage • [Ernst et al. 2004] • Emulate push with extra passes • Impractical, slow

Our Contribution • Stackless kd-tree traversal algorithms • KD-Restart • KD-Backtrack

Z X B X Y Y Z D C A A C B D Observation Current leaf’s tmax Next leaf’s tmin =

Z X B Y D C A KD-Restart • Standard traversal • Omit stack operations • Proceed to 1st leaf • If no intersection • Advance (tmin,tmax) • Restart from root • Proceed to next leaf

KD-Restart • Restart traversal after each leaf • m leaves • Average depth d • Cost O(m*d) • Balanced tree of n nodes • Upper bound: O(n log(n)) • Standard algorithm: O(n) • Expected: O( log(n) )

Z X B X Y Y Z D C A A C B D Observation Ancestor of A isparent of Z

Z X B Y D C A KD-Backtrack • If no intersection • Advance (tmin, tmax) • Start backtracking • If node intersects (tmin, tmax) • Resume traversal • Proceed to next leaf

KD-Backtrack • Backtrack after leaf • Revisits previous nodes • At most twice: from left, right • Within constant factor of standard traversal • Upper bound: O(n) • Expected: O( log(n) ) • Requires additional storage • Parent pointers • Bounding boxes for internal nodes

Implementation • Built GPU raytracer in Brook [Buck et al.] • 4 intersection schemes: • Brute Force • Uniform Grid • KD-Restart • KD-Backtrack

Scenes Stanford Bunny 69451 triangles Cornell Box 32 triangles BART Robots 71708 triangles BART Kitchen 110561 triangles

Results Box Bunny Robots Kitchen 12.9 Relative speedup over brute-force intersection.

Results Rays in each state throughout traversal.

Discussion • Absolute performance • Trails best CPU implementations 5-6x • Sources of inefficiency • Load balancing • Data reuse

Load Balancing • Subset of rays intersecting, traversing • Occlusion queries to select kernel • Early-Z to cull inactive rays • Approximately 5x overhead • Query, kernel switch overhead • Worse with fewer rays

Data Reuse • Every kernel • Loads ray origin/direction • Load/Store traversal state • Consumes streaming bandwidth • We are bandwidth-limited • CPU implementation stores these in registers

Branching • Merge multiple passes into larger kernel • Fragment branches for load balancing • Avoid load/store of reused data • Current branching has high overhead • Shifts efficiency burden to HW

Conclusion • Stackless Traversal • Allows efficient GPU kd-tree • Scales to larger, more complex scenes • Future Work • Changes in HW • Alternative acceleration structures • “Out-of-core” scenes • Dynamic scenes

Acknowledgements • Tim Purcell (NVIDA) • Streaming raytracer • Mark Segal (ATI) • Demo machine • NVIDIA, ATI : HW • DARPA, Rambus : Funding

Questions

KD-Tree Acceleration Structures for a GPU Raytracer

KD-Tree Acceleration Structures for a GPU Raytracer

Presentation Transcript

GPU Acceleration of Finite Element Computations

Acceleration Data Structures for Ray Tracing

Spatial data structures – kd -trees

Tree Data Structures

Acceleration Techniques for GPU-based Volume Rendering

GPU Acceleration of SVG November 2011

Acceleration Data Structures

GPU Nearest Neighbor Searches using a Minimal kd-tree

GPU Acceleration in ITK v4

Gaussian KD-Tree for Fast High-Dimensional Filtering

GPU acceleration in Matlab

PRACTICAL IMPROVEMENT: A-STAR, KD-tree, fibonacci Heap

Tree Data Structures

Specialized Acceleration Structures for Ray-Tracing

Acceleration Structures

Acceleration Data Structures for Ray Tracing

GPU Acceleration in ITK v4

Acceleration Techniques for GPU-based Volume Rendering

Acceleration Structures