- 145 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Real-Time Parallel Radiosity' - angelo

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

What is Radiosity?

- A computer graphics technique for lighting
- Two types of lighting algorithms:
- Local: easy, fast, but not realistic
- Global: slow, difficult, but highest quality
- Radiosity is a global algorithm
- Global algorithms try to take into account interreflections in scenes.

Radiosity in Real Time?

- Local algorithms can easily run in real time, either in software or hardware.
- Computational demands grow with number of surfaces and number of lights. 106 surfaces is not unreasonable!
- Most global algorithms take quadratic time in number of surfaces (all interactions!).

Local Doesn’t Mean Bad

Id Software’s Doom 3 (video capture)

…but Global is Better

State-of-the-art radiosity rendering from 1988 (5 hours render time!)

Local Lighting Math

- Consider a light source at some point in three-dimensional space, and a surface at some other location.
- How much does the light directly contribute to the brightness of the surface?
- Note that global algorithms still need to do local lighting as a first step.

Local Lighting Math

- Set up a 3-dimensional coordinate system centered on the surface. Define unit vectors L (light), N (normal), E (eye):
- L points towards the light.
- N is perpendicular to the surface.
- E points towards the viewer.

N

E

L

Local Lighting Math

- If an object sits between the light and surface, the lighting contribution is zero.
- Otherwise…
- Surfaces generally reflect light in two ways: “diffuse” (dull) and “specular” (shiny).
- We can see three colors: red, green, blue.
- So let Md, Ms be 3-vectors indicating how much of [R,G,B] are reflected in each way.

Local Lighting Math

- Diffuse lighting is independent of view angle. It is brightest when N and L are most closely aligned, and falls off with the cosine of the angle between them.
- All lighting also falls off with the square of the distance from the light source.
- So, the diffuse term is d-2Md(N · L).

Local Lighting Math

- Specular lighting is view-dependent. In one simple formulation (Blinn shading), we determine the vector H halfway between E and L and evaluate d-2Ms(N · H)s, where s indicates the shininess of the surface.

H

N

E

L

Local Lighting Math

- The easiest way to think of H, the half-angle vector, is that if H and N were aligned, and the surface were a mirror, then the light would reflect straight to the eye.
- (N · H)s can be thought of as representing a probability distribution of “microfacets,” whose normals are clustered around N but do vary. Smoother surfaces have higher s.

Local Lighting Math

- So, our full contribution from a light source is d-2(Md(N · L) + Ms(N · H)s).
- This may also be multiplied by a light color.
- We may also add in an emissive term Me for glowing objects.
- If the lack of interreflection makes things too dark, we may add an “ambient” term LaMd.

Approximations

- We may not compute this formula at every pixel on the screen, but only at the vertices of the object instead.
- The specular exponent may be evaluated using a power approximation rather than a real power function.
- The specular formula is already a cheesy hack…

More Approximations

- Shadows are hard. At the cost of much realism, they can be omitted or faked.
- Draw a little dark spot under an object
- Ambient is itself an approximation of real global illumination.
- N · L and (N · H)s are idealizations—in reality, this can be an arbitrary 4-dimensional function called a “BRDF.”

Radiosity

- Radiosity is a global algorithm that handles diffuse lighting only.
- The term “global illumination” refers to global algorithms that handle specular lighting also.
- Specular makes things much more difficult.
- I will only discuss plain radiosity.

Radiosity in a Nutshell

- Suppose there are n surfaces in the scene.
- Let Ai be the area of surface i.
- Let Ei be the amount of light energy emitted from surface i per unit time and area.
- Let Bi be the amount of light energy emitted and/or reflected from surface i per unit time and area.
- Let i be the diffuse albedo of surface i.

Radiosity in a Nutshell

- Now, let Fij (called a “form factor”) be the fraction of light from surface i that reaches surface j.
- Then, for all i, we must have:

AiBi = AiEi + i j [1,n] AjBjFji

- This is just a linear system with n equations and n unknowns. Solve and you get the B’s.

That Easy?

- Well, not quite that easy.
- Solving the system of equations is O(n3).
- So we’ll iterate instead.
- We still have to do local lighting.
- These become the Ei’s.
- We have to compute the Fij terms somehow.
- This turns out to be expensive.
- If the scene is static, precompute!

Computing Form Factors

- Fij turns out to be a big ugly integral over the area of both i and j.
- Worse, one of the terms in the integral is whether the two dA’s can see one another!
- So, no closed-form solution.
- Standard numerical integration is no good. A raycast per sample takes too long.

Computing Form Factors

- The usual solution is called the “hemicube algorithm.”
- Render the scene from the point of view of the surface, in all directions. In effect, you are projecting onto a hemicube.
- Count up the number of times you can see each surface (weighted appropriately).
- Takes advantage of 3D acceleration!

Simplified Radiosity Equation

- It so happens that FjiAj = FijAi.
- This is a simple property of the integral for F.
- So we can simplify the radiosity equation:

AiBi = AiEi + i j [1,n] AjBjFji

AiBi = AiEi + i j [1,n] AiBjFij

Bi = Ei + i j [1,n] BjFij

B = E + RB (where Rij = iFij)

Solving the Radiosity Equation

- B = E + RB is just a matrix/vector equation.
- Direct solution: B = (I R)-1E
- Iterative solution:
- If E is a local lighting solution, then call it B0.
- Now let Bi+1 = B0 + RBi.
- Then, Bi is simply the lighting solution after i bounces! Since i < 1 for all i (conservation of energy), the Bi’s converge to B.

Iterative vs. Direct Radiosity

- If F is a dense matrix, then direct solution takes time O(n3), while k steps of iteration takes time O(kn2).
- Realistically, k is just a constant; say, 5.
- Iterative solution is practical with n ranging up to the hundreds of thousands!
- Iteration time is proportional to the number of nonzero entries of F.

Sparsity of Form Factors

- In a simple cube-shaped room, all form factors will be nonzero except for pairs on the same wall.
- So F is 5/6 nonzero. Not very encouraging…
- As more objects are added, F becomes sparser.
- As the scene expands beyond one room, F becomes much sparser.
- So iterative radiosity scales extremely well!

Storage and Precision

- Radiosity hogs memory.
- If you grid a cube-shaped room 100x100 on each wall, and store F as a dense matrix of floats, that’s 14.4 GB. (!!!)
- Storing F as sparse helps a lot.
- Good for iteration speed too.
- Using smaller values than floats helps too.
- 16-bit fixed-point is good enough.

RLE Encoding of Form Factors

- F tends to have runs of zeros and nonzeros.
- Smart traversal order of grids makes the runs longer.

Bad

Good

RLE Encoding of Form Factors

- My disk storage format:
- 1 byte: run length, run type
- Length up to 84, type is zero, 255, or 65535
- Then, variable # bytes with run data
- Compression ratio for my scene: 5.97:1
- My memory storage format:
- 2 bytes: run length up to 65535
- 2 bytes: run type (zero or 65535)
- 2N bytes: run data
- Compression ratio for my scene: 2.49:1

Parallelization

- Split up the surfaces among the CPUs.
- Each CPU owns those rows of the form factor matrix.
- Each CPU computes its surfaces’ local lighting.
- Every iteration requires an all-to-all communication, so that each CPU has the full B vector.
- At present, my storage of F is unbalanced; compression ranges from 4.3:1 to 1.6:1.

Radiosity Iteration Kernel

- Hand-written MMX assembly code (only main inner loop shown):

inner_loop:

prefetchnta [ebx+128]

prefetchnta [eax+128]

movq mm4, [eax]

pshufw mm7, mm4, 0xFF

pshufw mm6, mm4, 0xAA

pshufw mm5, mm4, 0x55

pshufw mm4, mm4, 0x00

pmulhuw mm4, [ebx]

pmulhuw mm5, [ebx+6]

pmulhuw mm6, [ebx+12]

pmulhuw mm7, [ebx+18]

paddw mm0, mm4

paddw mm1, mm5

paddw mm2, mm6

paddw mm3, mm7

add eax, 8

add ebx, 24

dec esi

jnz inner_loop

Radiosity Kernel Performance

- Timed 8 CPUs doing 500 iterations.
- Portable C fixed-point kernel: 38.04 s
- Optimized MMX kernel: 21.64 s
- On most loaded CPU, works out to 528 million multiply-adds/s for the C version, 1.24 billion multiply-adds/s for MMX.
- But MMX code wastes 25% of them, so real rate is 928 million.

Local Lighting Implementation

- One raycast is required for each quad, for each light source! This can be expensive.
- To accelerate raycasts, I made a simplified version of my scene that was virtually indistinguishable for raycasting purposes.
- 13028 quads reduced to 120 polys, 110 cylinders
- Some cylinders used as geometry, some as bounding volumes

Overall Performance

- Again, 8 CPUs on 500 iterations:
- Iteration only: 21.64 s
- Communication only: 5.96 s
- Iteration plus communication: 26.84 s
- All computation: 65.7 s
- All computation plus communication: 64.38 s

Remarks on Performance

- The communication overlaps very well with the computation, to the point that it is actually a speedup. (!)
- MPI_Isend, MPI_Irecv are essential to achieving this.
- The O(n) local lighting computation is actually taking much longer than the O(n2) radiosity computation.
- Local lighting is only pseudo-O(n), because of the raycast cost—although for large scenes, raycast cost should still be much less than O(n2), due to other optimizations.

Radiosity Frontend

- Separate client application that runs on a Windows PC with OpenGL acceleration.
- Radiosity solver running on cluster is server.
- Original plan was that the frontend would send the scene to the server, and the server would use the scene provided.
- Since the cluster has no OpenGL acceleration, I was reluctantly forced to precompute form factors.
- All aspects of scene except form factors still sent to the server by the client; form factors are read from disk.

Client/Server Architecture

- User precomputes form factors and FTPs them to the Beowulf.
- Server listens on beowulf.lcs.mit.edu:5353.
- Client connects and sends scene information.
- Server reads form factors off of disk.
- Both open a network thread.
- Server streams radiosity to client via TCP/IP.
- Computation, rendering, and communication are completely decoupled.

Frontend Features

- Per-vertex or per-pixel local lighting, with local viewer. Optional specular.
- Shadows implemented using stencil buffer.
- Display radiosity input/output (E and B).
- Bilinear filtering of radiosity solutions on grid-shaped surfaces.
- Ultra-high-quality mode where radiosity is used for indirect lighting only.

Work Yet to be Done

- More complex scene would be nice. This one is 13K quads, and I should be able to do 50K.
- More optimization work on raycasting.
- Better load balancing.
- Optimization of some modes on frontend, so they run reasonably on my laptop’s GeForce2 Go, not just on a GeForce4 Ti…
- Alleviation of ugly banding on certain lighting modes caused by 8-bit-per-component precision.

Conclusions

- Real-time radiosity is feasible.
- Not tomorrow, but today.
- If today’s cluster is tomorrow’s desktop, real-time radiosity could start showing up in real applications not too many years from now.
- Biggest limitation may be the ability to compute form factors efficiently.
- Faster graphics hardware will make this happen.

Download Presentation

Connecting to Server..