introduction to parallel rendering sorting chromium and mpi n.
Download
Skip this Video
Download Presentation
Introduction to Parallel Rendering: Sorting, Chromium, and MPI

Loading in 2 Seconds...

play fullscreen
1 / 23

Introduction to Parallel Rendering: Sorting, Chromium, and MPI - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

Introduction to Parallel Rendering: Sorting, Chromium, and MPI. Mengxia Zhu Spring 2006. Parallel Rendering. Graphics rendering process is computationally intensive Parallel computation is a natural measure to leverage for higher performance Two levels of parallelism:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Introduction to Parallel Rendering: Sorting, Chromium, and MPI' - mara-poole


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
parallel rendering
Parallel Rendering
  • Graphics rendering process is computationally intensive
  • Parallel computation is a natural measure to leverage for higher performance
  • Two levels of parallelism:
    • Functional parallelism – pipelining
    • Data parallelism – multiple results computed at the same time
data parallel algorithms
Data Parallel Algorithms
  • A lot of taxonomies of categorizing parallel algorithms
    • Image space vs. object space
    • Shared memory architecture, distributed memory architecture
    • MPI, OpenMP, …
  • Need a uniform framework to study and understand parallel rendering
sorting in rendering
Sorting in Rendering
  • Rendering as a sorting process:
    • Sort from object coordinates to screen coordinates
    • Use this concept to study computational and communication costs
  • The key procedure: calculating the effect of each primitive on each pixel
  • Use this concept to study computational and communication costs
sorting categories
Sorting Categories
  • The location of this ‘sort’ determines the structure of the parallel algorithm
  • Sort-first
    • during geometry processing
    • distributes “raw” primitives
  • Sort-middle
    • between geom. processing and rasterization
    • distributes screen-space primitives
  • Sort-last
    • during rasterization
    • distributes pixels/fragments
sorting cont

Sort-First

Sort-Middle

Sort-Last

G

R

G

R

G

R

G

R

G

R

G

R

C

G

R

G

R

G

R

Sorting cont
  • A landmark paper: “A sorting classification of parallel rendering”, Molner, et. al., IEEE CG&A’94.
sort first
Sort First
  • Primitives initially assigned arbitrarily
  • Pre-transformation is done to determine which screen regions are covered
  • Primitives are then redistributed over the network to the correct renderer
  • Renderer performs the work of the entire pipeline for that primitive from that point on
sort first cont1
Sort First cont
  • Screen space is partitioned into non-overlapping 2D tiles, each is rendered independently by a tightly coupled pair of geometry and rasterization processors.
  • Sub-image of 2D tiles are composited without depth comparison.
analysis terms
Analysis Terms
  • Assume a dataset containing nr raw primitives with average size ar.
  • We will call primitives that result from tessellation display primitives. If T is the tessellation ratio, there are nd= Tnrof these, with average size ad= ar /T. If there is no tessellation, T = 1, nd= nr, and ad= ar .
  • Assume an image containing A pixels and need to compute S samples per pixel. Assume that all primitives within the viewing frustum.
sort first analysis
Sort-first analysis
  • Pros:
    • Low communication requirements when tessellation or oversampling are high, or when inter-frame coherence exploited
    • Processors implement entire rendering pipeline for a given screen region
  • Cons:
    • Susceptible to load imbalance (clumping)
    • Exploiting coherence is difficult
sort middle
Sort Middle
  • Primitives initially assigned arbitrarily
  • Primitives fully transformed, lit, etc., by the geometry processor to which they are initially assigned
  • Transformed primitives are distributed over the network to the rasterizer assigned to their region of the screen
sort middle analysis
Sort Middle Analysis
  • Pros:
    • Redistribution occurs at a “natural” place
  • Cons:
    • High communication cost if T is high
    • Susceptible to load imbalance in the same way as sort-first
  • Overhead:
    • Display primitive distribution cost
    • Tessellation factor
sort last1
Sort Last
  • Defers sorting until the end (imagine phase)
  • Renderers operate independently until the visibility stage
  • Fragments transmitted over network to compositing processors to resolve visibility
sort last analysis
Sort Last Analysis
  • Pros:
    • Renderers implement full pipeline and are independent until pixel merging
    • Less prone to load imbalance
    • Very scalable
  • Cons:
    • Pixel traffic can be extremely high
slide19

Image Composition

  • A naïve approach is binary compositing.
  • Each disjoint pair of processors produces a new subimage.
  • N/2 subimages are left after the first stage.
  • Half the number of the original processors are paired up for the next level of compositing hence another half would be idle.
  • The binary-swap compositing method makes sure that every processor participates in all the stages of the process.
  • The key idea – at each compositing stage, the two processors involved in a composite operation split the image plane into two pieces.
slide20

Binary Swap Example

  • The binary-swap compositing algorithm for four processors:
which to choose
Which to choose?
  • It depends.
  • Which ones can be best matched to hardware capabilities?
  • Number of primitives, tessellation factor, coherence, etc., are all considerations. Many tradeoffs.
load balancing
Load Balancing
  • For better load balancing,
    • Task queuing: the task queue can be ordered in decreasing task size, such that the concurrency gets finer until the queue is exhausted.
    • Load stealing: having nodes steal smaller tasks from other nodes, once they have completed their own tasks
    • Time stamp: timeout stamps used for each task, such that if the node can not finish its task before the timeout, it takes the remnant of the task, re-partitions it and re-distributes it.
  • Hierarchical data structures, such as octree, k-d tree, etc., are commonly used.
references
References
  • These slides reference contents from
    • Jian Huang at University of Tennessee at Knoxville
    • William Gropp and Ewing Lusk at Argonne National Laboratory