1 / 32

320 likes | 482 Views

Kd -Jump. A Path-Preserving Stackless Traversal for Faster Isosurface Ray tracing on GPUs. David Meirion Hughes. Ik Soo Lim. Bangor University, UK. . Problem Setting and Previous work. Problem Setting. Problem Setting: Ray Tracing. Tracing rays from camera Find the intersections

Download Presentation
## Kd -Jump

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Kd-Jump**A Path-Preserving Stackless Traversal for FasterIsosurface Ray tracing on GPUs. David MeirionHughes. IkSoo Lim. Bangor University, UK.**Problem Setting and Previous work**Problem Setting**Problem Setting:Ray Tracing**• Tracing rays from camera • Find the intersections • Avoid uninteresting areas • Acceleration structure • Division of space • Requires ray traversal**Problem Setting: Traversal of Kd-Trees**tnear tfar node* • Downward Traversal • Two branch choices • Remember furthest • Traverse nearest • Test for intersections • If branch had no hit? • Traversal Restore • Go back to other branch Ray Stack 0.5 0.75 0x1...**Problem Setting: GPUs**• Several MPU’s • Parallel execution • Kernels • Thousands of threads • light-weight code • On-chip memory very fast • On-board memory slow**Problem Setting: Ray tracing on GPUs**One Memory Transaction Two Memory Transactions Three Memory Transactions Ray 1: tnear tnear tnear tfar tfar tfar node* node* node* • Stack – still a problem? • Memory Size • Coalesced Access • One Stack element: • Ray Segment • Node address/look-up • times depth-of-tree • times ray-count • One kernel call Ray 2: Ray 3: ...**Previous Work:Stackless Traversal**Ray 1: tnear tnear tnear tfar tfar tfar node* node* node* • Avoid using stack • Current thinking • Less memory • No global memory use • Faster Ray 2: Ray 3: ...**Previous Work:Stackless Traversal**Tested Twice • Avoid using stack • Kd-Restart • Restart From Root • + Very little memory • - Revisits previous nodes • - Longer thread life • - Exacerbates incoherence**Previous Work:Stackless Traversal**Tested Twice • Avoid using stack • Kd-Restart • Kd-Backtrack • Backtrack up tree • + Very little memory • + Better than Kd-Restart • - Revisits previous nodes • - Longer thread life • - Exacerbates incoherence**Previous Work:Stackless Traversal**(per node) Additional Pointers • Avoid using stack • Kd-Restart • Kd-Backtrack • Ropes • Nodes have neighbour links • + Shorter ray life • - Lots of extra memory**Motivation and Description**Kd-Jump**Kd-Jump:Motivation**• Goal: • Same path as Stack method • Least-amount memory • How: • Indices rather than pointers • Down traverse with equation • Return using inverse • Binary bits for return markers**Kd-Jump:Index Reference**• Each node reference by index • x, y, z, etc... • depth Level Memory Blocks [x,y,z] memory map**Kd-Jump:Method Description**• Traversal into children • Update an index element • Determined by the split dimension • Multiply by 2 • Add child offset f C = 2x + f [x,y,z] x-dimension split [2x+f,y,z] f=0 f=1**Kd-Jump:Method Description**• Traversal back to parent • Apply inverse of downward step • Can replace f with floor function • Do not need to consider what f was • f = 0 or 1, only (C-f)/2 = x floor(C/2) = x [floor(x/2),y,z] x-dimension split [x,y,z]**Kd-Jump:Method Description**• Traversal to common parent • Apply inverse on all indices • Divide elements by power of 2 • Number of splits • Matrix of Split information • Store in constant memory • (cached) • (alternative) Store on the fly 0 0 1 0 1 1 floor(x/21), floor(y/22), ... 2 2 2 1 1 0 1 2 2 2 [x,y,z]**Kd-Jump:Method Description**• Determine jump amount • Mark common parents • 1 bit • Store in MSB order • On return • Count right-trail zero bits • This is the return depth • Subtract from current depth • Jump amount 32-bit Register 0 1 0 0 0**Kd-Jump:Method Description**• Re-clip Ray • Bounds stored or computed Bound X Bound Y**Kd-Jump:Scope**• Nodes referenced with indices • Traversal equations invertible • Forget route choices in inverse • Index-to-memory map • Limit wasted memory • Balanced kd-tree • implicit kd-tree • Requires node bounds • Re-computed with implicit kd-tree**Kd-Jump:Isosurfacing with implicit Kd-tree**• Wald’s implicit Kd-tree • min/max of node branch • left-balanced • No-waste memory map • Bounds/splits computed**Implementation:Isosurfacing with implicit Kd-tree**• Minor differences • Node test • Test prior to traversing • Reduces number of returns • Stack, kd-jump, kd-restart**Results:Isosurfacing with implicit Kd-tree**• Kd-Jump faster • Ray time-active important (kd-restart) • Stack only slightly slower • High occupancy (75%) • High ray coherence = automatic coalesced access Frames Per Second. Average across multiple iso/view**Results:Isosurfacing with implicit Kd-tree**• All use one 32-bit register • stack_size, tfar_max, depth_flags • Stack memory allocated for all rays • Single kernel • Constant memory as fast as registers • Once data cached. Memory Use**Analysis:Kd-Jump**• Theoretical performance. • Memory access not hidden • However, perfectly coalesced.**Analysis:Kd-Jump**• Bottlenecks • Stack memory bound • Kd-Jump computation bound**Hybrid Kd-tree:Exploiting Texture caching**• Build implicit tree • Depth threshold • Volume stepping • Texture cache • Very fast • Threshold depth? • intersection method • Iso-surface • View direction**Hybrid Kd-tree:Results**Frames Per Second. Average across multiple view**Conclusions**• Kd-Jump • Stackless • Index based • Immediate backtrack to common parent • No dependency on ray coherency • At least if bounds can be computed • Hybrid Kd-Jump • Texture cache over Acceleration Structure • Variable depth threshold of branches • View, intersection method, iso-value.**Conclusions**• Future prediction • Memory access and speed improving • Current trend • Usefulness of Stackless • Reduced memory cost • Reduce dependency on coherency • Less iterations (Ropes) than stack • Big question: One kernel, verses many? • Stack favours one kernel. • i.e., no reorganising (can break coalesced access) • Can organise into groups of same depth though? • Many kernels = better device occupancy • Memory access better hidden**Future Work**• Kd-Jump with General Kd-Tree’s? • Real-time explicit from implicit • CUDA 3.0 • Dynamic Warps? • Ideal for ray tracing? • Inter-device communication?**Addendum:Indices for general case kd-trees?**• Nodes need bounds for re-clip • Accept the cost? • Compute them somehow? • BVH stores them anyway • Memory Map • Very difficult to remove wasted space • Feasible to minimise waste?

More Related