interactive distributed ray tracing of highly complex models l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Interactive Distributed Ray Tracing of Highly Complex Models PowerPoint Presentation
Download Presentation
Interactive Distributed Ray Tracing of Highly Complex Models

Loading in 2 Seconds...

play fullscreen
1 / 30

Interactive Distributed Ray Tracing of Highly Complex Models - PowerPoint PPT Presentation


  • 272 Views
  • Uploaded on

Interactive Distributed Ray Tracing of Highly Complex Models Ingo Wald University of Saarbr ü cken http://graphics.cs.uni-sb.de/~wald http://graphics.cs.uni-sb.de/rtrt Reference Model (12.5 million tris) Power Plant - Detail Views Previous Work

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Interactive Distributed Ray Tracing of Highly Complex Models' - andrew


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
interactive distributed ray tracing of highly complex models

Interactive Distributed Ray Tracing of Highly Complex Models

Ingo Wald

University of Saarbrücken

http://graphics.cs.uni-sb.de/~wald

http://graphics.cs.uni-sb.de/rtrt

previous work
Previous Work
  • Interactive Rendering of Massive Models (UNC)
    • Framework of algorithms
      • Textured-depth-meshes (96% reduction in #tris)
      • View-Frustum Culling & LOD (50% each)
      • Hierarchical occlusion maps (10%)
    • Extensive preprocessing required
      • Entire model: ~3 weeks (estimated)
    • Framerate (Onyx): 5 to 15 fps
    • Needs shared-memory supercomputer
previous work ii
Previous Work II
  • Memory Coherent RT, Pharr (Stanford)
    • Explicit cache management for rays and geometry
      • Extensive reordering and scheduling
      • Too slow for interactive rendering
    • Provides global illumination
  • Parallel Ray-Tracing, Parker et al. (Utah) &Muus (ARL)
    • Needs shared-memory supercomputer
  • Interactive Rendering with Coherent Ray Tracing (Saarbrücken, EG 2001)
    • IRT on (cheap) PC systems
    • Avoiding CPU stalls is crucial
previous work lessons learned
Previous Work: Lessons Learned…
  • Rasterization possible for massive models …… but not ‘straightforward’ (UNC)
  • Interactive Ray Tracing is possible (Utah,Saarbrücken)
    • Easy to parallelize
    • Cost is only logarithmic in scene size
  • Conclusion: Parallel, Interactive Ray Tracing should work great for Massive Models
parallel irt
Parallel IRT
  • Parallel Interactive Ray Tracing
    • Supercomputer: more threads…
    • PCs: Distributed IRT on CoW
  • Distributed CoW: Need fast access to scene data
  • Simplistic access to scene data
    • mmap+Caching, all done automatically by OS
    • Either: Replicate scene
      • Extremely inflexible
    • Or: Access to single copy of scene over NFS (mmap)
      • Network issues: Latencies/Bandwidth
simplistic approach
Simplistic Approach

Caching via OS support won’t work:

  • OS can’t even address more than 2Gb of data…
    • Massive Models >> 2Gb !
    • Also an issue when replicating the scene…
  • Process stalls due to demand paging
    • stalls very expensive !
      • Dual-1GHz-PIII: 1 ms stall = 1 million cycles = about 1000 rays !
    • OS automatically stalls process  reordering impossible…
distributed scene access
Distributed Scene Access
  • Simplistic approach doesn’t work…
  • Need ‘manual’ caching and memory management
caching scene data
Caching Scene Data
  • 2-Level Hierarchy of BSP-Trees
    • Caching based on self-contained “voxels“
    • Clients need only top-level bsp (few kb)
    • Straightforward implementation…
caching scene data12
Caching Scene Data
  • Preprocessing: Splitting Into Voxels
    • Simple spatial sorting (bsp-tree construction)
    • Out-of-core algorithm due to model size
      • Filesize-limit and address space (2GB)
    • Simplistic implementation: 2.5 hours
  • Model Server
    • One machine serves entire model

Single server = Potential bottleneck !

    • Could easily be distributed
hiding cpu stalls
Hiding CPU Stalls
  • Caching alone does not prevent stalls !
  • Avoiding Stalls  Reordering
    • Suspend rays that would stall on missing data
    • Fetch missing data asynchronously !
    • Immediately continue with other ray
      • Potentially no CPU stall at all !
    • Resume stalled rays after data is available
  • Can only hide ‘some’ latency

 Minimize voxel-fetching latencies

reducing latencies
Reducing Latencies
  • Reduce Network Latencies
    • Prefetching ?
      • Hard to predict data accesses several ms is advance !
    • Latency is dominated by transmission time

(100Mbit/s  1MB = 80ms = 160 million cycles !!!)

    • Reduce transmitted data volume
reducing bandwidth
Reducing Bandwidth
  • Compression of Voxel Data
    • LZO-library provides for 3:1 compression
    • If compared to original transmission time, decompression cost is negligible !
  • Dual-CPU system: Sharing of Voxel Cache
    • Amortize bandwidth, storage and decompression effort over both CPUs…

Even better for more CPUs

load balancing
Load Balancing
  • Load Balancing
    • Demand driven distribution of tiles (32x32)
    • Buffering of work tiles on the client
      • Avoid communication latency
  • Frame-to-Frame Coherence

Improves Caching

    • Keep rays on the same client
      • Simple: Keep tiles on the same client (implemented)
      • Better: Assign tiles based on reprojected pixels (future)
results
Results
  • Setup
    • Seven dual Pentium-III 800-866 MHzas rendering clients
      • 100 Mbit FastEthernet
    • One display & model server (same machine)
      • GigabitEthernet (already necessary for pixels data)
  • Powerplant Performance
    • 3-6 fps in pure C implementation
    • 6-12 fps with SSE support
animation framerate vs bandwidth
Animation:Framerate vs. Bandwidth

 Latency-hiding works !

scalability
Scalability

Server bottleneck after 12 CPUs

 Distribute model server!

performance detail views
Performance: Detail Views

Framerate (640x480) : 3.9 - 4.7 fps (seven dual P-III 800-866 Mhz CPUs, NO SSE)

shadows and reflections
Shadows and Reflections

Framerate: 1.4-2.2 fps (NO SSE)

conclusions
Conclusions
  • IRT works great for highly complex models !
    • Distribution issues can be solved
    • At least as fast as sophisticated HW-techniques
    • Less preprocessing
    • Cheap
    • Simple & easy to extend (shadows, reflections, shading,…)
future work
Future Work
  • Smaller cache granularity
  • Distributed scene server
  • Cache-coherent load balancing
  • Dynamic scenes & instances
  • Hardware support for ray-tracing
acknowledgments
Acknowledgments
  • Anselmo Lastra, UNC
    • Power plant reference model

… other complex models are welcome…

questions
Questions ?

For further information visit

http://graphics.cs.uni-sb.de/rtrt

detailed view of power plant
Detailed View of Power Plant

Framerate: 4.7 fps (seven dual P-III 800-866 Mhz CPUs, NO SSE)

detail view furnace
Detail View: Furnace

Framerate: 3.9 fps, NO SSE

overview
Overview
  • Reference Model
  • Previous Work
  • Distribution Issues
  • Massive Model Issues
  • Images & Demo