Interactive Distributed Ray Tracing of Highly Complex Models - PowerPoint PPT Presentation

andrew
interactive distributed ray tracing of highly complex models l.
Skip this Video
Loading SlideShow in 5 Seconds..
Interactive Distributed Ray Tracing of Highly Complex Models PowerPoint Presentation
Download Presentation
Interactive Distributed Ray Tracing of Highly Complex Models

play fullscreen
1 / 30
Download Presentation
Interactive Distributed Ray Tracing of Highly Complex Models
282 Views
Download Presentation

Interactive Distributed Ray Tracing of Highly Complex Models

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Interactive Distributed Ray Tracing of Highly Complex Models Ingo Wald University of Saarbrücken http://graphics.cs.uni-sb.de/~wald http://graphics.cs.uni-sb.de/rtrt

  2. Reference Model (12.5 million tris)

  3. Power Plant- Detail Views

  4. Previous Work • Interactive Rendering of Massive Models (UNC) • Framework of algorithms • Textured-depth-meshes (96% reduction in #tris) • View-Frustum Culling & LOD (50% each) • Hierarchical occlusion maps (10%) • Extensive preprocessing required • Entire model: ~3 weeks (estimated) • Framerate (Onyx): 5 to 15 fps • Needs shared-memory supercomputer

  5. Previous Work II • Memory Coherent RT, Pharr (Stanford) • Explicit cache management for rays and geometry • Extensive reordering and scheduling • Too slow for interactive rendering • Provides global illumination • Parallel Ray-Tracing, Parker et al. (Utah) &Muus (ARL) • Needs shared-memory supercomputer • Interactive Rendering with Coherent Ray Tracing (Saarbrücken, EG 2001) • IRT on (cheap) PC systems • Avoiding CPU stalls is crucial

  6. Previous Work: Lessons Learned… • Rasterization possible for massive models …… but not ‘straightforward’ (UNC) • Interactive Ray Tracing is possible (Utah,Saarbrücken) • Easy to parallelize • Cost is only logarithmic in scene size • Conclusion: Parallel, Interactive Ray Tracing should work great for Massive Models

  7. Parallel IRT • Parallel Interactive Ray Tracing • Supercomputer: more threads… • PCs: Distributed IRT on CoW • Distributed CoW: Need fast access to scene data • Simplistic access to scene data • mmap+Caching, all done automatically by OS • Either: Replicate scene • Extremely inflexible • Or: Access to single copy of scene over NFS (mmap) • Network issues: Latencies/Bandwidth

  8. Simplistic Approach Caching via OS support won’t work: • OS can’t even address more than 2Gb of data… • Massive Models >> 2Gb ! • Also an issue when replicating the scene… • Process stalls due to demand paging • stalls very expensive ! • Dual-1GHz-PIII: 1 ms stall = 1 million cycles = about 1000 rays ! • OS automatically stalls process  reordering impossible…

  9. Distributed Scene Access • Simplistic approach doesn’t work… • Need ‘manual’ caching and memory management

  10. Caching Scene Data • 2-Level Hierarchy of BSP-Trees • Caching based on self-contained “voxels“ • Clients need only top-level bsp (few kb) • Straightforward implementation…

  11. BSP-Tree: Structure and Caching Grain

  12. Caching Scene Data • Preprocessing: Splitting Into Voxels • Simple spatial sorting (bsp-tree construction) • Out-of-core algorithm due to model size • Filesize-limit and address space (2GB) • Simplistic implementation: 2.5 hours • Model Server • One machine serves entire model Single server = Potential bottleneck ! • Could easily be distributed

  13. Hiding CPU Stalls • Caching alone does not prevent stalls ! • Avoiding Stalls  Reordering • Suspend rays that would stall on missing data • Fetch missing data asynchronously ! • Immediately continue with other ray • Potentially no CPU stall at all ! • Resume stalled rays after data is available • Can only hide ‘some’ latency  Minimize voxel-fetching latencies

  14. Reducing Latencies • Reduce Network Latencies • Prefetching ? • Hard to predict data accesses several ms is advance ! • Latency is dominated by transmission time (100Mbit/s  1MB = 80ms = 160 million cycles !!!) • Reduce transmitted data volume

  15. Reducing Bandwidth • Compression of Voxel Data • LZO-library provides for 3:1 compression • If compared to original transmission time, decompression cost is negligible ! • Dual-CPU system: Sharing of Voxel Cache • Amortize bandwidth, storage and decompression effort over both CPUs… Even better for more CPUs

  16. Load Balancing • Load Balancing • Demand driven distribution of tiles (32x32) • Buffering of work tiles on the client • Avoid communication latency • Frame-to-Frame Coherence Improves Caching • Keep rays on the same client • Simple: Keep tiles on the same client (implemented) • Better: Assign tiles based on reprojected pixels (future)

  17. Results • Setup • Seven dual Pentium-III 800-866 MHzas rendering clients • 100 Mbit FastEthernet • One display & model server (same machine) • GigabitEthernet (already necessary for pixels data) • Powerplant Performance • 3-6 fps in pure C implementation • 6-12 fps with SSE support

  18. Animation:Framerate vs. Bandwidth  Latency-hiding works !

  19. Scalability Server bottleneck after 12 CPUs  Distribute model server!

  20. Performance: Detail Views Framerate (640x480) : 3.9 - 4.7 fps (seven dual P-III 800-866 Mhz CPUs, NO SSE)

  21. Shadows and Reflections Framerate: 1.4-2.2 fps (NO SSE)

  22. Demo

  23. Conclusions • IRT works great for highly complex models ! • Distribution issues can be solved • At least as fast as sophisticated HW-techniques • Less preprocessing • Cheap • Simple & easy to extend (shadows, reflections, shading,…)

  24. Future Work • Smaller cache granularity • Distributed scene server • Cache-coherent load balancing • Dynamic scenes & instances • Hardware support for ray-tracing

  25. Acknowledgments • Anselmo Lastra, UNC • Power plant reference model … other complex models are welcome…

  26. Questions ? For further information visit http://graphics.cs.uni-sb.de/rtrt

  27. Four Power Plants (50 million tris)

  28. Detailed View of Power Plant Framerate: 4.7 fps (seven dual P-III 800-866 Mhz CPUs, NO SSE)

  29. Detail View: Furnace Framerate: 3.9 fps, NO SSE

  30. Overview • Reference Model • Previous Work • Distribution Issues • Massive Model Issues • Images & Demo