1 / 42

# Real-Time Parallel Radiosity - PowerPoint PPT Presentation

Real-Time Parallel Radiosity. Matt Craighead May 8, 2002 6.338J/18.337J, Course 6 AUP. What is Radiosity?. A computer graphics technique for lighting Two types of lighting algorithms: Local: easy, fast, but not realistic Global: slow, difficult, but highest quality

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

May 8, 2002

6.338J/18.337J, Course 6 AUP

• A computer graphics technique for lighting
• Two types of lighting algorithms:
• Local: easy, fast, but not realistic
• Global: slow, difficult, but highest quality
• Radiosity is a global algorithm
• Global algorithms try to take into account interreflections in scenes.
• Local algorithms can easily run in real time, either in software or hardware.
• Computational demands grow with number of surfaces and number of lights. 106 surfaces is not unreasonable!
• Most global algorithms take quadratic time in number of surfaces (all interactions!).

Id Software’s Doom 3 (video capture)

…but Global is Better

State-of-the-art radiosity rendering from 1988 (5 hours render time!)

Local Lighting Math
• Consider a light source at some point in three-dimensional space, and a surface at some other location.
• How much does the light directly contribute to the brightness of the surface?
• Note that global algorithms still need to do local lighting as a first step.
Local Lighting Math
• Set up a 3-dimensional coordinate system centered on the surface. Define unit vectors L (light), N (normal), E (eye):
• L points towards the light.
• N is perpendicular to the surface.
• E points towards the viewer.

N

E

L

Local Lighting Math
• If an object sits between the light and surface, the lighting contribution is zero.
• Otherwise…
• Surfaces generally reflect light in two ways: “diffuse” (dull) and “specular” (shiny).
• We can see three colors: red, green, blue.
• So let Md, Ms be 3-vectors indicating how much of [R,G,B] are reflected in each way.
Local Lighting Math
• Diffuse lighting is independent of view angle. It is brightest when N and L are most closely aligned, and falls off with the cosine of the angle between them.
• All lighting also falls off with the square of the distance from the light source.
• So, the diffuse term is d-2Md(N · L).
Local Lighting Math
• Specular lighting is view-dependent. In one simple formulation (Blinn shading), we determine the vector H halfway between E and L and evaluate d-2Ms(N · H)s, where s indicates the shininess of the surface.

H

N

E

L

Local Lighting Math
• The easiest way to think of H, the half-angle vector, is that if H and N were aligned, and the surface were a mirror, then the light would reflect straight to the eye.
• (N · H)s can be thought of as representing a probability distribution of “microfacets,” whose normals are clustered around N but do vary. Smoother surfaces have higher s.
Local Lighting Math
• So, our full contribution from a light source is d-2(Md(N · L) + Ms(N · H)s).
• This may also be multiplied by a light color.
• We may also add in an emissive term Me for glowing objects.
• If the lack of interreflection makes things too dark, we may add an “ambient” term LaMd.
Approximations
• We may not compute this formula at every pixel on the screen, but only at the vertices of the object instead.
• The specular exponent may be evaluated using a power approximation rather than a real power function.
• The specular formula is already a cheesy hack…
More Approximations
• Shadows are hard. At the cost of much realism, they can be omitted or faked.
• Draw a little dark spot under an object
• Ambient is itself an approximation of real global illumination.
• N · L and (N · H)s are idealizations—in reality, this can be an arbitrary 4-dimensional function called a “BRDF.”
• Radiosity is a global algorithm that handles diffuse lighting only.
• The term “global illumination” refers to global algorithms that handle specular lighting also.
• Specular makes things much more difficult.
• I will only discuss plain radiosity.
• Suppose there are n surfaces in the scene.
• Let Ai be the area of surface i.
• Let Ei be the amount of light energy emitted from surface i per unit time and area.
• Let Bi be the amount of light energy emitted and/or reflected from surface i per unit time and area.
• Let i be the diffuse albedo of surface i.
• Now, let Fij (called a “form factor”) be the fraction of light from surface i that reaches surface j.
• Then, for all i, we must have:

AiBi = AiEi + i j  [1,n] AjBjFji

• This is just a linear system with n equations and n unknowns. Solve and you get the B’s.
That Easy?
• Well, not quite that easy.
• Solving the system of equations is O(n3).
• We still have to do local lighting.
• These become the Ei’s.
• We have to compute the Fij terms somehow.
• This turns out to be expensive.
• If the scene is static, precompute!
Computing Form Factors
• Fij turns out to be a big ugly integral over the area of both i and j.
• Worse, one of the terms in the integral is whether the two dA’s can see one another!
• So, no closed-form solution.
• Standard numerical integration is no good. A raycast per sample takes too long.
Computing Form Factors
• The usual solution is called the “hemicube algorithm.”
• Render the scene from the point of view of the surface, in all directions. In effect, you are projecting onto a hemicube.
• Count up the number of times you can see each surface (weighted appropriately).
• Takes advantage of 3D acceleration!
• It so happens that FjiAj = FijAi.
• This is a simple property of the integral for F.
• So we can simplify the radiosity equation:

AiBi = AiEi + i j  [1,n] AjBjFji

AiBi = AiEi + i j  [1,n] AiBjFij

Bi = Ei + i j  [1,n] BjFij

B = E + RB (where Rij = iFij)

• B = E + RB is just a matrix/vector equation.
• Direct solution: B = (I  R)-1E
• Iterative solution:
• If E is a local lighting solution, then call it B0.
• Now let Bi+1 = B0 + RBi.
• Then, Bi is simply the lighting solution after i bounces! Since i < 1 for all i (conservation of energy), the Bi’s converge to B.
• If F is a dense matrix, then direct solution takes time O(n3), while k steps of iteration takes time O(kn2).
• Realistically, k is just a constant; say, 5.
• Iterative solution is practical with n ranging up to the hundreds of thousands!
• Iteration time is proportional to the number of nonzero entries of F.
Sparsity of Form Factors
• In a simple cube-shaped room, all form factors will be nonzero except for pairs on the same wall.
• So F is 5/6 nonzero. Not very encouraging…
• As more objects are added, F becomes sparser.
• As the scene expands beyond one room, F becomes much sparser.
• So iterative radiosity scales extremely well!
Storage and Precision
• If you grid a cube-shaped room 100x100 on each wall, and store F as a dense matrix of floats, that’s 14.4 GB. (!!!)
• Storing F as sparse helps a lot.
• Good for iteration speed too.
• Using smaller values than floats helps too.
• 16-bit fixed-point is good enough.
RLE Encoding of Form Factors
• F tends to have runs of zeros and nonzeros.
• Smart traversal order of grids makes the runs longer.

Good

RLE Encoding of Form Factors
• My disk storage format:
• 1 byte: run length, run type
• Length up to 84, type is zero, 255, or 65535
• Then, variable # bytes with run data
• Compression ratio for my scene: 5.97:1
• My memory storage format:
• 2 bytes: run length up to 65535
• 2 bytes: run type (zero or 65535)
• 2N bytes: run data
• Compression ratio for my scene: 2.49:1
Parallelization
• Split up the surfaces among the CPUs.
• Each CPU owns those rows of the form factor matrix.
• Each CPU computes its surfaces’ local lighting.
• Every iteration requires an all-to-all communication, so that each CPU has the full B vector.
• At present, my storage of F is unbalanced; compression ranges from 4.3:1 to 1.6:1.
• Hand-written MMX assembly code (only main inner loop shown):

inner_loop:

prefetchnta [ebx+128]

prefetchnta [eax+128]

movq mm4, [eax]

pshufw mm7, mm4, 0xFF

pshufw mm6, mm4, 0xAA

pshufw mm5, mm4, 0x55

pshufw mm4, mm4, 0x00

pmulhuw mm4, [ebx]

pmulhuw mm5, [ebx+6]

pmulhuw mm6, [ebx+12]

pmulhuw mm7, [ebx+18]

dec esi

jnz inner_loop

• Timed 8 CPUs doing 500 iterations.
• Portable C fixed-point kernel: 38.04 s
• Optimized MMX kernel: 21.64 s
• On most loaded CPU, works out to 528 million multiply-adds/s for the C version, 1.24 billion multiply-adds/s for MMX.
• But MMX code wastes 25% of them, so real rate is 928 million.
Local Lighting Implementation
• One raycast is required for each quad, for each light source! This can be expensive.
• To accelerate raycasts, I made a simplified version of my scene that was virtually indistinguishable for raycasting purposes.
• 13028 quads reduced to 120 polys, 110 cylinders
• Some cylinders used as geometry, some as bounding volumes
Overall Performance
• Again, 8 CPUs on 500 iterations:
• Iteration only: 21.64 s
• Communication only: 5.96 s
• Iteration plus communication: 26.84 s
• All computation: 65.7 s
• All computation plus communication: 64.38 s
Remarks on Performance
• The communication overlaps very well with the computation, to the point that it is actually a speedup. (!)
• MPI_Isend, MPI_Irecv are essential to achieving this.
• The O(n) local lighting computation is actually taking much longer than the O(n2) radiosity computation.
• Local lighting is only pseudo-O(n), because of the raycast cost—although for large scenes, raycast cost should still be much less than O(n2), due to other optimizations.
• Separate client application that runs on a Windows PC with OpenGL acceleration.
• Radiosity solver running on cluster is server.
• Original plan was that the frontend would send the scene to the server, and the server would use the scene provided.
• Since the cluster has no OpenGL acceleration, I was reluctantly forced to precompute form factors.
• All aspects of scene except form factors still sent to the server by the client; form factors are read from disk.
Client/Server Architecture
• User precomputes form factors and FTPs them to the Beowulf.
• Server listens on beowulf.lcs.mit.edu:5353.
• Client connects and sends scene information.
• Server reads form factors off of disk.
• Both open a network thread.
• Server streams radiosity to client via TCP/IP.
• Computation, rendering, and communication are completely decoupled.
Frontend Features
• Per-vertex or per-pixel local lighting, with local viewer. Optional specular.
• Shadows implemented using stencil buffer.
• Display radiosity input/output (E and B).
• Bilinear filtering of radiosity solutions on grid-shaped surfaces.
• Ultra-high-quality mode where radiosity is used for indirect lighting only.
Work Yet to be Done
• More complex scene would be nice. This one is 13K quads, and I should be able to do 50K.
• More optimization work on raycasting.