Interactive distributed ray tracing of highly complex models
Download
1 / 30

Interactive Distributed Ray Tracing of Highly Complex Models - PowerPoint PPT Presentation


  • 268 Views
  • Updated On :

Interactive Distributed Ray Tracing of Highly Complex Models Ingo Wald University of Saarbr ü cken http://graphics.cs.uni-sb.de/~wald http://graphics.cs.uni-sb.de/rtrt Reference Model (12.5 million tris) Power Plant - Detail Views Previous Work

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Interactive Distributed Ray Tracing of Highly Complex Models' - andrew


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Interactive distributed ray tracing of highly complex models l.jpg

Interactive Distributed Ray Tracing of Highly Complex Models

Ingo Wald

University of Saarbrücken

http://graphics.cs.uni-sb.de/~wald

http://graphics.cs.uni-sb.de/rtrt


Reference model 12 5 million tris l.jpg
Reference Model (12.5 million tris)


Power plant detail views l.jpg
Power Plant- Detail Views


Previous work l.jpg
Previous Work

  • Interactive Rendering of Massive Models (UNC)

    • Framework of algorithms

      • Textured-depth-meshes (96% reduction in #tris)

      • View-Frustum Culling & LOD (50% each)

      • Hierarchical occlusion maps (10%)

    • Extensive preprocessing required

      • Entire model: ~3 weeks (estimated)

    • Framerate (Onyx): 5 to 15 fps

    • Needs shared-memory supercomputer


Previous work ii l.jpg
Previous Work II

  • Memory Coherent RT, Pharr (Stanford)

    • Explicit cache management for rays and geometry

      • Extensive reordering and scheduling

      • Too slow for interactive rendering

    • Provides global illumination

  • Parallel Ray-Tracing, Parker et al. (Utah) &Muus (ARL)

    • Needs shared-memory supercomputer

  • Interactive Rendering with Coherent Ray Tracing (Saarbrücken, EG 2001)

    • IRT on (cheap) PC systems

    • Avoiding CPU stalls is crucial


Previous work lessons learned l.jpg
Previous Work: Lessons Learned…

  • Rasterization possible for massive models …… but not ‘straightforward’ (UNC)

  • Interactive Ray Tracing is possible (Utah,Saarbrücken)

    • Easy to parallelize

    • Cost is only logarithmic in scene size

  • Conclusion: Parallel, Interactive Ray Tracing should work great for Massive Models


Parallel irt l.jpg
Parallel IRT

  • Parallel Interactive Ray Tracing

    • Supercomputer: more threads…

    • PCs: Distributed IRT on CoW

  • Distributed CoW: Need fast access to scene data

  • Simplistic access to scene data

    • mmap+Caching, all done automatically by OS

    • Either: Replicate scene

      • Extremely inflexible

    • Or: Access to single copy of scene over NFS (mmap)

      • Network issues: Latencies/Bandwidth


Simplistic approach l.jpg
Simplistic Approach

Caching via OS support won’t work:

  • OS can’t even address more than 2Gb of data…

    • Massive Models >> 2Gb !

    • Also an issue when replicating the scene…

  • Process stalls due to demand paging

    • stalls very expensive !

      • Dual-1GHz-PIII: 1 ms stall = 1 million cycles = about 1000 rays !

    • OS automatically stalls process  reordering impossible…


Distributed scene access l.jpg
Distributed Scene Access

  • Simplistic approach doesn’t work…

  • Need ‘manual’ caching and memory management


Caching scene data l.jpg
Caching Scene Data

  • 2-Level Hierarchy of BSP-Trees

    • Caching based on self-contained “voxels“

    • Clients need only top-level bsp (few kb)

    • Straightforward implementation…



Caching scene data12 l.jpg
Caching Scene Data

  • Preprocessing: Splitting Into Voxels

    • Simple spatial sorting (bsp-tree construction)

    • Out-of-core algorithm due to model size

      • Filesize-limit and address space (2GB)

    • Simplistic implementation: 2.5 hours

  • Model Server

    • One machine serves entire model

      Single server = Potential bottleneck !

    • Could easily be distributed


Hiding cpu stalls l.jpg
Hiding CPU Stalls

  • Caching alone does not prevent stalls !

  • Avoiding Stalls  Reordering

    • Suspend rays that would stall on missing data

    • Fetch missing data asynchronously !

    • Immediately continue with other ray

      • Potentially no CPU stall at all !

    • Resume stalled rays after data is available

  • Can only hide ‘some’ latency

     Minimize voxel-fetching latencies


Reducing latencies l.jpg
Reducing Latencies

  • Reduce Network Latencies

    • Prefetching ?

      • Hard to predict data accesses several ms is advance !

    • Latency is dominated by transmission time

      (100Mbit/s  1MB = 80ms = 160 million cycles !!!)

    • Reduce transmitted data volume


Reducing bandwidth l.jpg
Reducing Bandwidth

  • Compression of Voxel Data

    • LZO-library provides for 3:1 compression

    • If compared to original transmission time, decompression cost is negligible !

  • Dual-CPU system: Sharing of Voxel Cache

    • Amortize bandwidth, storage and decompression effort over both CPUs…

      Even better for more CPUs


Load balancing l.jpg
Load Balancing

  • Load Balancing

    • Demand driven distribution of tiles (32x32)

    • Buffering of work tiles on the client

      • Avoid communication latency

  • Frame-to-Frame Coherence

    Improves Caching

    • Keep rays on the same client

      • Simple: Keep tiles on the same client (implemented)

      • Better: Assign tiles based on reprojected pixels (future)


Results l.jpg
Results

  • Setup

    • Seven dual Pentium-III 800-866 MHzas rendering clients

      • 100 Mbit FastEthernet

    • One display & model server (same machine)

      • GigabitEthernet (already necessary for pixels data)

  • Powerplant Performance

    • 3-6 fps in pure C implementation

    • 6-12 fps with SSE support


Animation framerate vs bandwidth l.jpg
Animation:Framerate vs. Bandwidth

 Latency-hiding works !


Scalability l.jpg
Scalability

Server bottleneck after 12 CPUs

 Distribute model server!


Performance detail views l.jpg
Performance: Detail Views

Framerate (640x480) : 3.9 - 4.7 fps (seven dual P-III 800-866 Mhz CPUs, NO SSE)


Shadows and reflections l.jpg
Shadows and Reflections

Framerate: 1.4-2.2 fps (NO SSE)



Conclusions l.jpg
Conclusions

  • IRT works great for highly complex models !

    • Distribution issues can be solved

    • At least as fast as sophisticated HW-techniques

    • Less preprocessing

    • Cheap

    • Simple & easy to extend (shadows, reflections, shading,…)


Future work l.jpg
Future Work

  • Smaller cache granularity

  • Distributed scene server

  • Cache-coherent load balancing

  • Dynamic scenes & instances

  • Hardware support for ray-tracing


Acknowledgments l.jpg
Acknowledgments

  • Anselmo Lastra, UNC

    • Power plant reference model

      … other complex models are welcome…


Questions l.jpg
Questions ?

For further information visit

http://graphics.cs.uni-sb.de/rtrt


Four power plants 50 million tris l.jpg
Four Power Plants (50 million tris)


Detailed view of power plant l.jpg
Detailed View of Power Plant

Framerate: 4.7 fps (seven dual P-III 800-866 Mhz CPUs, NO SSE)


Detail view furnace l.jpg
Detail View: Furnace

Framerate: 3.9 fps, NO SSE


Overview l.jpg
Overview

  • Reference Model

  • Previous Work

  • Distribution Issues

  • Massive Model Issues

  • Images & Demo