1 / 30

Interactive Distributed Ray Tracing of Highly Complex Models

Interactive Distributed Ray Tracing of Highly Complex Models Ingo Wald University of Saarbr ü cken http://graphics.cs.uni-sb.de/~wald http://graphics.cs.uni-sb.de/rtrt Reference Model (12.5 million tris) Power Plant - Detail Views Previous Work

andrew
Download Presentation

Interactive Distributed Ray Tracing of Highly Complex Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interactive Distributed Ray Tracing of Highly Complex Models Ingo Wald University of Saarbrücken http://graphics.cs.uni-sb.de/~wald http://graphics.cs.uni-sb.de/rtrt

  2. Reference Model (12.5 million tris)

  3. Power Plant- Detail Views

  4. Previous Work • Interactive Rendering of Massive Models (UNC) • Framework of algorithms • Textured-depth-meshes (96% reduction in #tris) • View-Frustum Culling & LOD (50% each) • Hierarchical occlusion maps (10%) • Extensive preprocessing required • Entire model: ~3 weeks (estimated) • Framerate (Onyx): 5 to 15 fps • Needs shared-memory supercomputer

  5. Previous Work II • Memory Coherent RT, Pharr (Stanford) • Explicit cache management for rays and geometry • Extensive reordering and scheduling • Too slow for interactive rendering • Provides global illumination • Parallel Ray-Tracing, Parker et al. (Utah) &Muus (ARL) • Needs shared-memory supercomputer • Interactive Rendering with Coherent Ray Tracing (Saarbrücken, EG 2001) • IRT on (cheap) PC systems • Avoiding CPU stalls is crucial

  6. Previous Work: Lessons Learned… • Rasterization possible for massive models …… but not ‘straightforward’ (UNC) • Interactive Ray Tracing is possible (Utah,Saarbrücken) • Easy to parallelize • Cost is only logarithmic in scene size • Conclusion: Parallel, Interactive Ray Tracing should work great for Massive Models

  7. Parallel IRT • Parallel Interactive Ray Tracing • Supercomputer: more threads… • PCs: Distributed IRT on CoW • Distributed CoW: Need fast access to scene data • Simplistic access to scene data • mmap+Caching, all done automatically by OS • Either: Replicate scene • Extremely inflexible • Or: Access to single copy of scene over NFS (mmap) • Network issues: Latencies/Bandwidth

  8. Simplistic Approach Caching via OS support won’t work: • OS can’t even address more than 2Gb of data… • Massive Models >> 2Gb ! • Also an issue when replicating the scene… • Process stalls due to demand paging • stalls very expensive ! • Dual-1GHz-PIII: 1 ms stall = 1 million cycles = about 1000 rays ! • OS automatically stalls process  reordering impossible…

  9. Distributed Scene Access • Simplistic approach doesn’t work… • Need ‘manual’ caching and memory management

  10. Caching Scene Data • 2-Level Hierarchy of BSP-Trees • Caching based on self-contained “voxels“ • Clients need only top-level bsp (few kb) • Straightforward implementation…

  11. BSP-Tree: Structure and Caching Grain

  12. Caching Scene Data • Preprocessing: Splitting Into Voxels • Simple spatial sorting (bsp-tree construction) • Out-of-core algorithm due to model size • Filesize-limit and address space (2GB) • Simplistic implementation: 2.5 hours • Model Server • One machine serves entire model Single server = Potential bottleneck ! • Could easily be distributed

  13. Hiding CPU Stalls • Caching alone does not prevent stalls ! • Avoiding Stalls  Reordering • Suspend rays that would stall on missing data • Fetch missing data asynchronously ! • Immediately continue with other ray • Potentially no CPU stall at all ! • Resume stalled rays after data is available • Can only hide ‘some’ latency  Minimize voxel-fetching latencies

  14. Reducing Latencies • Reduce Network Latencies • Prefetching ? • Hard to predict data accesses several ms is advance ! • Latency is dominated by transmission time (100Mbit/s  1MB = 80ms = 160 million cycles !!!) • Reduce transmitted data volume

  15. Reducing Bandwidth • Compression of Voxel Data • LZO-library provides for 3:1 compression • If compared to original transmission time, decompression cost is negligible ! • Dual-CPU system: Sharing of Voxel Cache • Amortize bandwidth, storage and decompression effort over both CPUs… Even better for more CPUs

  16. Load Balancing • Load Balancing • Demand driven distribution of tiles (32x32) • Buffering of work tiles on the client • Avoid communication latency • Frame-to-Frame Coherence Improves Caching • Keep rays on the same client • Simple: Keep tiles on the same client (implemented) • Better: Assign tiles based on reprojected pixels (future)

  17. Results • Setup • Seven dual Pentium-III 800-866 MHzas rendering clients • 100 Mbit FastEthernet • One display & model server (same machine) • GigabitEthernet (already necessary for pixels data) • Powerplant Performance • 3-6 fps in pure C implementation • 6-12 fps with SSE support

  18. Animation:Framerate vs. Bandwidth  Latency-hiding works !

  19. Scalability Server bottleneck after 12 CPUs  Distribute model server!

  20. Performance: Detail Views Framerate (640x480) : 3.9 - 4.7 fps (seven dual P-III 800-866 Mhz CPUs, NO SSE)

  21. Shadows and Reflections Framerate: 1.4-2.2 fps (NO SSE)

  22. Demo

  23. Conclusions • IRT works great for highly complex models ! • Distribution issues can be solved • At least as fast as sophisticated HW-techniques • Less preprocessing • Cheap • Simple & easy to extend (shadows, reflections, shading,…)

  24. Future Work • Smaller cache granularity • Distributed scene server • Cache-coherent load balancing • Dynamic scenes & instances • Hardware support for ray-tracing

  25. Acknowledgments • Anselmo Lastra, UNC • Power plant reference model … other complex models are welcome…

  26. Questions ? For further information visit http://graphics.cs.uni-sb.de/rtrt

  27. Four Power Plants (50 million tris)

  28. Detailed View of Power Plant Framerate: 4.7 fps (seven dual P-III 800-866 Mhz CPUs, NO SSE)

  29. Detail View: Furnace Framerate: 3.9 fps, NO SSE

  30. Overview • Reference Model • Previous Work • Distribution Issues • Massive Model Issues • Images & Demo

More Related