Interactive distributed ray tracing of highly complex models
Download
1 / 30

Slides ppt - PowerPoint PPT Presentation


  • 263 Views
  • Updated On :

Interactive Distributed Ray Tracing of Highly Complex Models Ingo Wald University of Saarbr ü cken http://graphics.cs.uni-sb.de/~wald http://graphics.cs.uni-sb.de/rtrt Reference Model (12.5 million tris) Power Plant - Detail Views Previous Work

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Slides ppt' - andrew


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Interactive distributed ray tracing of highly complex models l.jpg

Interactive Distributed Ray Tracing of Highly Complex Models

Ingo Wald

University of Saarbrücken

http://graphics.cs.uni-sb.de/~wald

http://graphics.cs.uni-sb.de/rtrt


Reference model 12 5 million tris l.jpg
Reference Model (12.5 million tris)


Power plant detail views l.jpg
Power Plant- Detail Views


Previous work l.jpg
Previous Work

  • Interactive Rendering of Massive Models (UNC)

    • Framework of algorithms

      • Textured-depth-meshes (96% reduction in #tris)

      • View-Frustum Culling & LOD (50% each)

      • Hierarchical occlusion maps (10%)

    • Extensive preprocessing required

      • Entire model: ~3 weeks (estimated)

    • Framerate (Onyx): 5 to 15 fps

    • Needs shared-memory supercomputer


Previous work ii l.jpg
Previous Work II

  • Memory Coherent RT, Pharr (Stanford)

    • Explicit cache management for rays and geometry

      • Extensive reordering and scheduling

      • Too slow for interactive rendering

    • Provides global illumination

  • Parallel Ray-Tracing, Parker et al. (Utah) &Muus (ARL)

    • Needs shared-memory supercomputer

  • Interactive Rendering with Coherent Ray Tracing (Saarbrücken, EG 2001)

    • IRT on (cheap) PC systems

    • Avoiding CPU stalls is crucial


Previous work lessons learned l.jpg
Previous Work: Lessons Learned…

  • Rasterization possible for massive models …… but not ‘straightforward’ (UNC)

  • Interactive Ray Tracing is possible (Utah,Saarbrücken)

    • Easy to parallelize

    • Cost is only logarithmic in scene size

  • Conclusion: Parallel, Interactive Ray Tracing should work great for Massive Models


Parallel irt l.jpg
Parallel IRT

  • Parallel Interactive Ray Tracing

    • Supercomputer: more threads…

    • PCs: Distributed IRT on CoW

  • Distributed CoW: Need fast access to scene data

  • Simplistic access to scene data

    • mmap+Caching, all done automatically by OS

    • Either: Replicate scene

      • Extremely inflexible

    • Or: Access to single copy of scene over NFS (mmap)

      • Network issues: Latencies/Bandwidth


Simplistic approach l.jpg
Simplistic Approach

Caching via OS support won’t work:

  • OS can’t even address more than 2Gb of data…

    • Massive Models >> 2Gb !

    • Also an issue when replicating the scene…

  • Process stalls due to demand paging

    • stalls very expensive !

      • Dual-1GHz-PIII: 1 ms stall = 1 million cycles = about 1000 rays !

    • OS automatically stalls process  reordering impossible…


Distributed scene access l.jpg
Distributed Scene Access

  • Simplistic approach doesn’t work…

  • Need ‘manual’ caching and memory management


Caching scene data l.jpg
Caching Scene Data

  • 2-Level Hierarchy of BSP-Trees

    • Caching based on self-contained “voxels“

    • Clients need only top-level bsp (few kb)

    • Straightforward implementation…



Caching scene data12 l.jpg
Caching Scene Data

  • Preprocessing: Splitting Into Voxels

    • Simple spatial sorting (bsp-tree construction)

    • Out-of-core algorithm due to model size

      • Filesize-limit and address space (2GB)

    • Simplistic implementation: 2.5 hours

  • Model Server

    • One machine serves entire model

      Single server = Potential bottleneck !

    • Could easily be distributed


Hiding cpu stalls l.jpg
Hiding CPU Stalls

  • Caching alone does not prevent stalls !

  • Avoiding Stalls  Reordering

    • Suspend rays that would stall on missing data

    • Fetch missing data asynchronously !

    • Immediately continue with other ray

      • Potentially no CPU stall at all !

    • Resume stalled rays after data is available

  • Can only hide ‘some’ latency

     Minimize voxel-fetching latencies


Reducing latencies l.jpg
Reducing Latencies

  • Reduce Network Latencies

    • Prefetching ?

      • Hard to predict data accesses several ms is advance !

    • Latency is dominated by transmission time

      (100Mbit/s  1MB = 80ms = 160 million cycles !!!)

    • Reduce transmitted data volume


Reducing bandwidth l.jpg
Reducing Bandwidth

  • Compression of Voxel Data

    • LZO-library provides for 3:1 compression

    • If compared to original transmission time, decompression cost is negligible !

  • Dual-CPU system: Sharing of Voxel Cache

    • Amortize bandwidth, storage and decompression effort over both CPUs…

      Even better for more CPUs


Load balancing l.jpg
Load Balancing

  • Load Balancing

    • Demand driven distribution of tiles (32x32)

    • Buffering of work tiles on the client

      • Avoid communication latency

  • Frame-to-Frame Coherence

    Improves Caching

    • Keep rays on the same client

      • Simple: Keep tiles on the same client (implemented)

      • Better: Assign tiles based on reprojected pixels (future)


Results l.jpg
Results

  • Setup

    • Seven dual Pentium-III 800-866 MHzas rendering clients

      • 100 Mbit FastEthernet

    • One display & model server (same machine)

      • GigabitEthernet (already necessary for pixels data)

  • Powerplant Performance

    • 3-6 fps in pure C implementation

    • 6-12 fps with SSE support


Animation framerate vs bandwidth l.jpg
Animation:Framerate vs. Bandwidth

 Latency-hiding works !


Scalability l.jpg
Scalability

Server bottleneck after 12 CPUs

 Distribute model server!


Performance detail views l.jpg
Performance: Detail Views

Framerate (640x480) : 3.9 - 4.7 fps (seven dual P-III 800-866 Mhz CPUs, NO SSE)


Shadows and reflections l.jpg
Shadows and Reflections

Framerate: 1.4-2.2 fps (NO SSE)



Conclusions l.jpg
Conclusions

  • IRT works great for highly complex models !

    • Distribution issues can be solved

    • At least as fast as sophisticated HW-techniques

    • Less preprocessing

    • Cheap

    • Simple & easy to extend (shadows, reflections, shading,…)


Future work l.jpg
Future Work

  • Smaller cache granularity

  • Distributed scene server

  • Cache-coherent load balancing

  • Dynamic scenes & instances

  • Hardware support for ray-tracing


Acknowledgments l.jpg
Acknowledgments

  • Anselmo Lastra, UNC

    • Power plant reference model

      … other complex models are welcome…


Questions l.jpg
Questions ?

For further information visit

http://graphics.cs.uni-sb.de/rtrt


Four power plants 50 million tris l.jpg
Four Power Plants (50 million tris)


Detailed view of power plant l.jpg
Detailed View of Power Plant

Framerate: 4.7 fps (seven dual P-III 800-866 Mhz CPUs, NO SSE)


Detail view furnace l.jpg
Detail View: Furnace

Framerate: 3.9 fps, NO SSE


Overview l.jpg
Overview

  • Reference Model

  • Previous Work

  • Distribution Issues

  • Massive Model Issues

  • Images & Demo


ad