Ddddrraw a prototype toolkit for distributed real time rendering on commodity clusters
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters PowerPoint PPT Presentation


  • 48 Views
  • Uploaded on
  • Presentation posted in: General

DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters. Thu D. Nguyen and Christopher Peery Department of Computer Science Rutgers University John Zahorjan Department of Computer Science & Engineering University of Washington. Overview.

Download Presentation

DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ddddrraw a prototype toolkit for distributed real time rendering on commodity clusters

DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters

Thu D. Nguyen and Christopher Peery

Department of Computer Science

Rutgers University

John Zahorjan

Department of Computer Science & Engineering

University of Washington


Overview

Overview

  • Improve real-time rendering performance using distributed rendering on commodity clusters

    • Real-time rendering -> interactive rendering applications

    • Improve performance -> Render more complex scenes at interactive rates

  • Why real-time rendering?

    • A critical component of an increasing number of continuous media applications

      • Virtual reality, data visualization, CAD, flight simulators, etc.

    • Rendering performance will continue to be a bottleneck

      • Model complexity increasing as fast (or faster) than hardware performance

      • Part of the challenge is to leverage increasingly powerful hardware accelerators


Challenges

Challenges

  • How to structure the distributed renderer to leverage hardware-assisted rendering

    • Information that is useful for work partitioning and assignment may be hidden in the hardware rendering pipeline

  • How to minimize non-parallelizable overheads (avoiding Amdhal’s Law)

  • How to decouple bandwidth requirement from the complexity of the scene and the cluster size


Image layer decomposition ild

Image Layer Decomposition (ILD)

  • Per-frame rendering load is partitioned using ILD

    • presented in IPDPS 2000

  • Briefly review ILD because it affects DDDDRRaW’s architecture and performance

  • Basic idea: assign scene objects such that sets of objects assigned to different nodes are not mutually occlusive

  • Advantages of using ILD

    • Do not need position of polygons in 2D

      • This information may be hidden inside the graphics pipeline

    • Do not need Z-buffer information

      • This reduces the required bandwidth by at least 50%


Image layer decomposition ild1

3

1

2

3

4

5

4

1

5

6

6

2

Image Layer Decomposition (ILD)

Spatial partitioning


Ild work assignment

3

5

4

1

6

2

ILD: Work Assignment

  • Non-mutually occlusive assignment -> legal for back-to-front compositing

  • Use heuristic-based algorithm to

    • Balance load across cluster

    • Minimize the screen real-estate covered by each assignment

Legal


Implementation architecture

Implementation: Architecture

Display

Node

App.

VRML

Scene,

Display

Window

  • Partitioning

  • Assignment

  • Decompress

  • Compositing

Display

Viewpoint

DDDDRRaW

Library

Work

Assignment

  • Rendering

  • Compress

Partial

Image

DDDDRRaW

Library

DDDDRRaW

Library

DDDDRRaW

Library

DDDDRRaW

Library

Rendering Nodes


Implementation details

Implementation Details

  • Implemented an optimization to ILD: dynamic selection of octants to be rendered

    • Minimize overhead of geometric transformation due to polygon splitting (in scene decomposition)

  • Compression of image layers before communication

    • Reduce bandwidth requirement to accommodate slower networks (eg., 100 Mb/s LANs)

  • Use dynamic clipping to enforce octant boundaries for scene with smooth shading and/or texturing

    • Simplification to ease implementation of prototype – this clipping could/should be done statically

    • 20-25 percent overhead for 5 of our 6 test scenes that would not be present in a production system


Performance measurement

Performance Measurement

  • Application: VRML viewer

    • VRweb – http://www.iicm.edu/vrwave

  • Collected 6 VRML scenes from the web

    • Use fix paths through scenes to measure performance in terms of average frame rate (frames/sec)

  • Two clusters representing different points in the technology spectrum

    • Cluster of 5 SGI O2s

      • 180 MHz Mips R5000, 256 MB memory, SGI Graphics Accelerator, 100 Mb/s switched Ethernet LAN

      • IRIX 6.5.7

    • Cluster of 13 PCs

      • Pentium III 800 MHz, 512 MB memory, Giganet 1 Gb/s cLAN

      • Red Hat Linux (kernel 2.2.14), Mesa 3D library version 3.2


Two test scenes

Two Test Scenes


Overheads on sgi o2s

Overheads on SGI O2s


Overheads on pcs

Overheads on PCs


Speed up of average frame rate on o2s

Speed-up of Average Frame Rate on O2s


Speed up of average frame rate on pcs

Speed-up of Average Frame Rate on PCs


Speed up of rendering component on pcs

Speed-up of Rendering Component on PCs


Conclusions

Conclusions

  • Can build an ILD-based distributed renderer to significantly improve real-time rendering performance on commodity hardware

  • DDDDRRaW currently scales to modestly sized cluster

    • This limitation is due to non-optimal hardware configurations

    • This is NOT because more suitable hardware is not available!

    • Expect good scalability to clusters of 16-32 nodes

  • Overlapping communication with computation increases average frame rate but ONLY at the expense of increasing frame latency

    • Problem is CPU contention for rendering & communication

    • Either need dedicated hardware or can only optimize after reaching 10-15 fps, the nominal interactive frame rate

  • Project URL: www.cs.washington.edu/research/ddddrraw/


Overlapping communication computation

Overlapping Communication & Computation

  • Communication and compression are significant sources of overhead

  • Apply standard parallel optimization technique: overlap communication of rendered image layers for one frame with rendering of the next

  • Requires pipelining of DDDDRRaW


The ddddrraw pipeline

The DDDDRRaw Pipeline

Display Node

ILD

Send

Receive

Decompress

Composite & Display

Send

Receive

Rendering Nodes

Stage 1

Stage 3

Render

Compress

Stage 2


Average frame rates

Average Frame Rates


Average frame latency

Average Frame Latency


  • Login