real time parallel radiosity n.
Skip this Video
Loading SlideShow in 5 Seconds..
Real-Time Parallel Radiosity PowerPoint Presentation
Download Presentation
Real-Time Parallel Radiosity

Loading in 2 Seconds...

play fullscreen
1 / 42

Real-Time Parallel Radiosity - PowerPoint PPT Presentation

  • Uploaded on

Real-Time Parallel Radiosity. Matt Craighead May 8, 2002 6.338J/18.337J, Course 6 AUP. What is Radiosity?. A computer graphics technique for lighting Two types of lighting algorithms: Local: easy, fast, but not realistic Global: slow, difficult, but highest quality

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Real-Time Parallel Radiosity' - angelo

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
real time parallel radiosity

Real-Time Parallel Radiosity

Matt Craighead

May 8, 2002

6.338J/18.337J, Course 6 AUP

what is radiosity
What is Radiosity?
  • A computer graphics technique for lighting
  • Two types of lighting algorithms:
    • Local: easy, fast, but not realistic
    • Global: slow, difficult, but highest quality
  • Radiosity is a global algorithm
  • Global algorithms try to take into account interreflections in scenes.
radiosity in real time
Radiosity in Real Time?
  • Local algorithms can easily run in real time, either in software or hardware.
  • Computational demands grow with number of surfaces and number of lights. 106 surfaces is not unreasonable!
  • Most global algorithms take quadratic time in number of surfaces (all interactions!).
local doesn t mean bad
Local Doesn’t Mean Bad

Id Software’s Doom 3 (video capture)

but global is better
…but Global is Better

State-of-the-art radiosity rendering from 1988 (5 hours render time!)

local lighting math
Local Lighting Math
  • Consider a light source at some point in three-dimensional space, and a surface at some other location.
  • How much does the light directly contribute to the brightness of the surface?
  • Note that global algorithms still need to do local lighting as a first step.
local lighting math1
Local Lighting Math
  • Set up a 3-dimensional coordinate system centered on the surface. Define unit vectors L (light), N (normal), E (eye):
        • L points towards the light.
        • N is perpendicular to the surface.
        • E points towards the viewer.




local lighting math2
Local Lighting Math
  • If an object sits between the light and surface, the lighting contribution is zero.
  • Otherwise…
  • Surfaces generally reflect light in two ways: “diffuse” (dull) and “specular” (shiny).
  • We can see three colors: red, green, blue.
  • So let Md, Ms be 3-vectors indicating how much of [R,G,B] are reflected in each way.
local lighting math3
Local Lighting Math
  • Diffuse lighting is independent of view angle. It is brightest when N and L are most closely aligned, and falls off with the cosine of the angle between them.
  • All lighting also falls off with the square of the distance from the light source.
  • So, the diffuse term is d-2Md(N · L).
local lighting math4
Local Lighting Math
  • Specular lighting is view-dependent. In one simple formulation (Blinn shading), we determine the vector H halfway between E and L and evaluate d-2Ms(N · H)s, where s indicates the shininess of the surface.





local lighting math5
Local Lighting Math
  • The easiest way to think of H, the half-angle vector, is that if H and N were aligned, and the surface were a mirror, then the light would reflect straight to the eye.
  • (N · H)s can be thought of as representing a probability distribution of “microfacets,” whose normals are clustered around N but do vary. Smoother surfaces have higher s.
local lighting math6
Local Lighting Math
  • So, our full contribution from a light source is d-2(Md(N · L) + Ms(N · H)s).
    • This may also be multiplied by a light color.
  • We may also add in an emissive term Me for glowing objects.
  • If the lack of interreflection makes things too dark, we may add an “ambient” term LaMd.
  • We may not compute this formula at every pixel on the screen, but only at the vertices of the object instead.
  • The specular exponent may be evaluated using a power approximation rather than a real power function.
    • The specular formula is already a cheesy hack…
more approximations
More Approximations
  • Shadows are hard. At the cost of much realism, they can be omitted or faked.
    • Draw a little dark spot under an object
  • Ambient is itself an approximation of real global illumination.
  • N · L and (N · H)s are idealizations—in reality, this can be an arbitrary 4-dimensional function called a “BRDF.”
  • Radiosity is a global algorithm that handles diffuse lighting only.
  • The term “global illumination” refers to global algorithms that handle specular lighting also.
    • Specular makes things much more difficult.
  • I will only discuss plain radiosity.
radiosity in a nutshell
Radiosity in a Nutshell
  • Suppose there are n surfaces in the scene.
  • Let Ai be the area of surface i.
  • Let Ei be the amount of light energy emitted from surface i per unit time and area.
  • Let Bi be the amount of light energy emitted and/or reflected from surface i per unit time and area.
  • Let i be the diffuse albedo of surface i.
radiosity in a nutshell1
Radiosity in a Nutshell
  • Now, let Fij (called a “form factor”) be the fraction of light from surface i that reaches surface j.
  • Then, for all i, we must have:

AiBi = AiEi + i j  [1,n] AjBjFji

  • This is just a linear system with n equations and n unknowns. Solve and you get the B’s.
that easy
That Easy?
  • Well, not quite that easy.
  • Solving the system of equations is O(n3).
    • So we’ll iterate instead.
  • We still have to do local lighting.
    • These become the Ei’s.
  • We have to compute the Fij terms somehow.
    • This turns out to be expensive.
    • If the scene is static, precompute!
computing form factors
Computing Form Factors
  • Fij turns out to be a big ugly integral over the area of both i and j.
  • Worse, one of the terms in the integral is whether the two dA’s can see one another!
  • So, no closed-form solution.
  • Standard numerical integration is no good. A raycast per sample takes too long.
computing form factors1
Computing Form Factors
  • The usual solution is called the “hemicube algorithm.”
    • Render the scene from the point of view of the surface, in all directions. In effect, you are projecting onto a hemicube.
    • Count up the number of times you can see each surface (weighted appropriately).
    • Takes advantage of 3D acceleration!
simplified radiosity equation
Simplified Radiosity Equation
  • It so happens that FjiAj = FijAi.
    • This is a simple property of the integral for F.
  • So we can simplify the radiosity equation:

AiBi = AiEi + i j  [1,n] AjBjFji

AiBi = AiEi + i j  [1,n] AiBjFij

Bi = Ei + i j  [1,n] BjFij

B = E + RB (where Rij = iFij)

solving the radiosity equation
Solving the Radiosity Equation
  • B = E + RB is just a matrix/vector equation.
  • Direct solution: B = (I  R)-1E
  • Iterative solution:
    • If E is a local lighting solution, then call it B0.
    • Now let Bi+1 = B0 + RBi.
    • Then, Bi is simply the lighting solution after i bounces! Since i < 1 for all i (conservation of energy), the Bi’s converge to B.
iterative vs direct radiosity
Iterative vs. Direct Radiosity
  • If F is a dense matrix, then direct solution takes time O(n3), while k steps of iteration takes time O(kn2).
  • Realistically, k is just a constant; say, 5.
  • Iterative solution is practical with n ranging up to the hundreds of thousands!
  • Iteration time is proportional to the number of nonzero entries of F.
sparsity of form factors
Sparsity of Form Factors
  • In a simple cube-shaped room, all form factors will be nonzero except for pairs on the same wall.
    • So F is 5/6 nonzero. Not very encouraging…
    • As more objects are added, F becomes sparser.
  • As the scene expands beyond one room, F becomes much sparser.
  • So iterative radiosity scales extremely well!
storage and precision
Storage and Precision
  • Radiosity hogs memory.
    • If you grid a cube-shaped room 100x100 on each wall, and store F as a dense matrix of floats, that’s 14.4 GB. (!!!)
  • Storing F as sparse helps a lot.
    • Good for iteration speed too.
  • Using smaller values than floats helps too.
    • 16-bit fixed-point is good enough.
rle encoding of form factors
RLE Encoding of Form Factors
  • F tends to have runs of zeros and nonzeros.
  • Smart traversal order of grids makes the runs longer.



rle encoding of form factors1
RLE Encoding of Form Factors
  • My disk storage format:
    • 1 byte: run length, run type
      • Length up to 84, type is zero, 255, or 65535
    • Then, variable # bytes with run data
    • Compression ratio for my scene: 5.97:1
  • My memory storage format:
    • 2 bytes: run length up to 65535
    • 2 bytes: run type (zero or 65535)
    • 2N bytes: run data
    • Compression ratio for my scene: 2.49:1
  • Split up the surfaces among the CPUs.
    • Each CPU owns those rows of the form factor matrix.
    • Each CPU computes its surfaces’ local lighting.
    • Every iteration requires an all-to-all communication, so that each CPU has the full B vector.
  • At present, my storage of F is unbalanced; compression ranges from 4.3:1 to 1.6:1.
radiosity iteration kernel
Radiosity Iteration Kernel
  • Hand-written MMX assembly code (only main inner loop shown):


prefetchnta [ebx+128]

prefetchnta [eax+128]

movq mm4, [eax]

pshufw mm7, mm4, 0xFF

pshufw mm6, mm4, 0xAA

pshufw mm5, mm4, 0x55

pshufw mm4, mm4, 0x00

pmulhuw mm4, [ebx]

pmulhuw mm5, [ebx+6]

pmulhuw mm6, [ebx+12]

pmulhuw mm7, [ebx+18]

paddw mm0, mm4

paddw mm1, mm5

paddw mm2, mm6

paddw mm3, mm7

add eax, 8

add ebx, 24

dec esi

jnz inner_loop

radiosity kernel performance
Radiosity Kernel Performance
  • Timed 8 CPUs doing 500 iterations.
    • Portable C fixed-point kernel: 38.04 s
    • Optimized MMX kernel: 21.64 s
  • On most loaded CPU, works out to 528 million multiply-adds/s for the C version, 1.24 billion multiply-adds/s for MMX.
  • But MMX code wastes 25% of them, so real rate is 928 million.
local lighting implementation
Local Lighting Implementation
  • One raycast is required for each quad, for each light source! This can be expensive.
  • To accelerate raycasts, I made a simplified version of my scene that was virtually indistinguishable for raycasting purposes.
    • 13028 quads reduced to 120 polys, 110 cylinders
    • Some cylinders used as geometry, some as bounding volumes
overall performance
Overall Performance
  • Again, 8 CPUs on 500 iterations:
    • Iteration only: 21.64 s
    • Communication only: 5.96 s
    • Iteration plus communication: 26.84 s
    • All computation: 65.7 s
    • All computation plus communication: 64.38 s
remarks on performance
Remarks on Performance
  • The communication overlaps very well with the computation, to the point that it is actually a speedup. (!)
    • MPI_Isend, MPI_Irecv are essential to achieving this.
  • The O(n) local lighting computation is actually taking much longer than the O(n2) radiosity computation.
    • Local lighting is only pseudo-O(n), because of the raycast cost—although for large scenes, raycast cost should still be much less than O(n2), due to other optimizations.
radiosity frontend
Radiosity Frontend
  • Separate client application that runs on a Windows PC with OpenGL acceleration.
    • Radiosity solver running on cluster is server.
  • Original plan was that the frontend would send the scene to the server, and the server would use the scene provided.
    • Since the cluster has no OpenGL acceleration, I was reluctantly forced to precompute form factors.
    • All aspects of scene except form factors still sent to the server by the client; form factors are read from disk.
client server architecture
Client/Server Architecture
  • User precomputes form factors and FTPs them to the Beowulf.
  • Server listens on
    • Client connects and sends scene information.
  • Server reads form factors off of disk.
  • Both open a network thread.
    • Server streams radiosity to client via TCP/IP.
    • Computation, rendering, and communication are completely decoupled.
frontend features
Frontend Features
  • Per-vertex or per-pixel local lighting, with local viewer. Optional specular.
  • Shadows implemented using stencil buffer.
  • Display radiosity input/output (E and B).
  • Bilinear filtering of radiosity solutions on grid-shaped surfaces.
  • Ultra-high-quality mode where radiosity is used for indirect lighting only.
work yet to be done
Work Yet to be Done
  • More complex scene would be nice. This one is 13K quads, and I should be able to do 50K.
  • More optimization work on raycasting.
  • Better load balancing.
  • Optimization of some modes on frontend, so they run reasonably on my laptop’s GeForce2 Go, not just on a GeForce4 Ti…
  • Alleviation of ugly banding on certain lighting modes caused by 8-bit-per-component precision.
  • Real-time radiosity is feasible.
    • Not tomorrow, but today.
  • If today’s cluster is tomorrow’s desktop, real-time radiosity could start showing up in real applications not too many years from now.
  • Biggest limitation may be the ability to compute form factors efficiently.
    • Faster graphics hardware will make this happen.