1 / 83

Cg and Hardware Accelerated Shading

Cg and Hardware Accelerated Shading. Cem Cebenoyan. Overview. Cg Overview Where we are in hardware today Physical Simulation on GPU GeforceFX / Cg Demos Advanced hair and skin rendering in “Dawn” Adaptive subdivision surfaces and ambient occlusion shading in “Ogre”

zlhna
Download Presentation

Cg and Hardware Accelerated Shading

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cg and Hardware Accelerated Shading Cem Cebenoyan

  2. Overview • Cg Overview • Where we are in hardware today • Physical Simulation on GPU • GeforceFX / Cg Demos • Advanced hair and skin rendering in “Dawn” • Adaptive subdivision surfaces and ambient occlusion shading in “Ogre” • Procedural shading in “Time Machine” • Depth of field and post-processing effects in “Toys” • OIT

  3. What is Cg? • A high level language for controlling parts of the graphics pipeline of modern GPUs • Today, this includes the vertex transformation and fragment processing units of the pipeline • Very C-like • Only simpler • Native support for vectors, matrices, dot-products, reflection vectors, etc. • Similar in scope to Renderman • But notably different to handle the way hardware accelerators work

  4. Cg Pipeline Overview Graphics Program Written in Cg “C” for Graphics Compiled & Optimized Low Level, Graphics “Assembly Code”

  5. Graphics Data Flow VertexProgram FragmentProgram Application Framebuffer Cg Program Cg Program // // Diffuse lighting // float d = dot (normalize(frag.N), normalize(frag.L)); if (d < 0) d = 0; c = d * f4tex2D( t, frag.uv ) * diffuse; …

  6. Graphics Hardware Today • Fully programmable vertex processing • Full IEEE 32-bit floating point processing • Native support for mul, dp3, dp4, rsq, pow, sin, cos... • Full support for branching, looping, subroutines • Fully programmable pixel processing • IEEE 32-bit, 16-bit (s10e5) math supported • Same native math ops as vertex, plus texture fetch, and derivative instructions • No branching, but >1000 instruction limit • Floating point textures / frame buffers • No blending / filtering yet • ~500mhz core clock

  7. Physical Simulation • Simple cellular automata-like simulations are possible on NV20 class hardware (e.g. Game of Life, Greg James’ water simulation, Mark Harris’ CML work) • Use textures to represent physical quantities (e.g. displacement, velocity, force) on a regular grid • Multiple texture lookups allow access to neighbouring values • Pixel shader calculates new values, renders results back to texture • Each rendering pass draws a single quad, calculating next time step in simulation

  8. Physical Simulation • Problem: 8 bit precision on NV20 is not enough, causes drifting, stability problems • Float precision on NV30 allows GPU physics to match CPU accuracy • New fragment programming model (longer programs, flexible dependent texture reads) allows much more interesting simulations

  9. Example: Cloth Simulation Shader • Uses Verlet integration (see: Jakobsen, GDC 2001) • Avoids storing explicit velocity • newx = x + (x – oldx)*damping + a*dt*dt • Not always accurate, but stable! • Store current and previous position of each particle in 2 RGB float textures • Fragment program calculates new position, writes result to float buffer • Copy float buffer back to texture for next iteration (could use render-to-texture instead) • Swap current and previous textures

  10. Cloth Shader Demo

  11. Cloth Simulation Shader • 2 passes: • 1. Perform integration • 2. Apply constraints: • Floor constraint • Sphere constraint • Distance constraints between particles • Read back float frame buffer using glReadPixels • Draw particles and constraints

  12. Cloth Simulation Cg Code (1st pass) void Integrate(inout float3 x, float3 oldx, float3 a, float timestep2, float damping){ x = x + damping*(x - oldx) + a*timestep2;}myFragout main(v2fconnector In, uniform texobjRECT x_tex, uniform texobjRECT ox_tex, uniform float timestep, uniform float damping, uniform float3 gravity){ myFragout Out; float2 s = In.TEX0.xy;// get current and previous position float3 x = f3texRECT(x_tex, s); float3 oldx = f3texRECT(ox_tex, s);// move the particle Integrate(x, oldx, gravity, timestep*timestep, damping); Out.COL.xyz = x; return Out;}

  13. Cloth Simulation Cg Code (2nd pass) // constrain particle to be fixed distance from another particlevoid DistanceConstraint(float3 x, inout float3 newx, float3 x2, float restlength, float stiffness){ float3 delta = x2 - x; float deltalength = length(delta); float diff = (deltalength - restlength) / deltalength; newx = newx + delta*stiffness*diff;} // constraint particle to be outside spherevoid SphereConstraint(inout float3 x, float3 center, float r){ float3 delta = x - center; float dist = length(delta); if (dist < r) { x = center + delta*(r / dist); }} // constrain particle to be above floorvoid FloorConstraint(inout float3 x, float level){ if (x.y < level) { x.y = level; }}

  14. Cloth Simulation Cg Code (cont.) myFragout main(v2fconnector In, uniform texobjRECT x_tex, uniform texobjRECT ox_tex, uniform float dist, uniform float stiffness){ myFragout Out; float2 s = In.TEX0.xy;// get current position float3 x = f3texRECT(x_tex, s);// satisfy constraints FloorConstraint(x, 0.0f); SphereConstraint(x, float3(0.0, 2.0, 0.0), 1.0f); // get positions of neighbouring particles float3 x1 = f3texRECT(x_tex, s + float2(1.0, 0.0) ); float3 x2 = f3texRECT(x_tex, s + float2(-1.0, 0.0) ); float3 x3 = f3texRECT(x_tex, s + float2(0.0, 1.0) ); float3 x4 = f3texRECT(x_tex, s + float2(0.0, -1.0) );// apply distance constraints float3 newx = x; if (s.x < 31) DistanceConstraint(x, newx, x1, dist, stiffness); if (s.x > 0) DistanceConstraint(x, newx, x2, dist, stiffness); if (s.y < 31) DistanceConstraint(x, newx, x3, dist, stiffness); if (s.y > 0) DistanceConstraint(x, newx, x4, dist, stiffness); Out.COL.xyz = newx; return Out;}

  15. Physical Simulation – Future Work • Limitation - only one destination buffer, can only modify position of one particle at a time • Could use pack instructions to store 2 vec4h (8 half floats) in 128 bit float buffer • Could also use additional textures to encode particle masses, stiffness, constraints between arbitrary particles (rigid bodies) • “float buffer to vertex array” extension offers possibility of directly interpreting results as geometry without any CPU intervention! • Collision detection with meshes is hard

  16. Demos Introduction • Developed 4 demos for the launch of GeForce FX • “Dawn” • “Toys” • “Time Machine” • “Ogre”(Spellcraft Studio)

  17. Characters Look Better With Hair

  18. Rendering Hair • Two options: • 1) Volumetric (texture) • 2) Geometric (lines) • We have used volumetric approximations (shells and fins) in the past (e.g. Wolfman demo) • Doesn’t work well for long hair • We considered using textured ribbons (popular in Japanese video games). Alpha sorting is a pain. • Performance of GeForce FX finally lets us render hair as geometry

  19. Rendering Hair as Lines • Each hair strand is rendered as a line strip (2-20 vertices, depending on curvature) • Problem: lines are a minimum of 1 pixel thick, regardless of distance from camera • Not possible to change line width per vertex • Can use camera-facing triangle strips, but these require twice the number of vertices, and have aliasing problems

  20. Anti-Aliasing • Two methods of anti-aliasing lines in OpenGL • GL_LINE_SMOOTH • High quality, but requires blending, sorting geometry • GL_MULTISAMPLE • Usually lower quality, but order independent • We used multisample anti-aliasing with “alpha to coverage” mode • By fading alpha to zero at the ends of hairs, coverage and apparent thickness decreases • “SAMPLE_ALPHA_TO_COVERAGE_ARB” is part of the ARB_multisample extension

  21. Hair Without Antialiasing

  22. Hair With Multisample Antialiasing

  23. Hair Shading • Hair is lit with simple anisotropic shader (Heidrich and Seidel model) • Low specular exponent, dim highlight looks best • Black hair = no shadows! • Self-shadowing hair is hard • Deep shadow maps • Opacity shadow maps • Top of head is painted black to avoid skin showing through • We also had a very short hair style, which helps

  24. Hair Styling is Important

  25. Hair Styling • Difficult to position 50,000 individual curves by hand • Typical solution is to define a small number of control hairs, which are then interpolated across the surface to produce render hairs • We developed a custom tool for hair styling • Commercial hair applications have poor styling tools and are not designed for real time output

  26. Hair Styling • Scalp is defined as a polygon mesh • Hairs are represented as cubic Bezier curves • Controls hairs are defined for each vertex • Render hairs are interpolated across triangles using barycentric coordinates • Number of generated hairs is based on triangle area to maintain constant density • Can add noise to interpolated hairs to add variation

  27. Hair Styling Tool • Provides a simple UI for styling hair • Combing tools • Lengthen / shorten • Straighten / mess up • Uses a simple physics simulation based on Verlet integration (Jakobson, GDC 2001) • Physics is run on control hairs only • Collision detection done with ellipsoids

  28. Dawn Demo • Show demo

  29. The Ogre Demo • A real-time preview of Spellcraft Studio’s in-production short movie “Yeah!” • Created in 3DStudio MAX • Used Character Studio for animation, plus Stitch plug-in for cloth simulation • Original movie was rendered in Brazil with global illumination • Available at: www.yeahthemovie.de • Our aim was to recreate the original as closely as possible, in real-time

  30. What are Subdivision Surfaces? • A curved surface defined as the limit of repeated subdivision steps on a polygonal model • Subdivision rules create new vertices, edges, faces based on neighboring features • We used the Catmull-Clark subdivision scheme (as used by Pixar) • MAX, Maya, Softimage, Lightwave all support forms of subdivision surfaces

  31. Realtime Adaptive Tessellation • Brute force subdivision is expensive • Generates lots of polygons where they aren’t needed • Number of polygons increases exponentially with each subdivision • Adaptive tessellation subdivides patches based on screen-space patch size test • Guaranteed crack-free • Generates normals and tangents on the fly • Culls off-screen and back-facing patches • CPU-based (uses SSE were possible)

  32. Control Mesh vs. Subdivided Mesh 4000 faces 17,000 triangles

  33. Control Mesh Detail

  34. Subdivided Mesh Detail

  35. Why Use Subdivision Surfaces? • Content • Characters were modeled with subdivision in mind (using 3DSMax “MeshSmooth/NURMS” modifier) • Scalability • wanted demo to be scalable to lower-end hardware • “Infinite” detail • Can zoom in forever without seeing hard edges • Animation compression • Just store low-res control mesh for each frame • May be accelerated on future GPUs

  36. Disadvantages of Realtime Subdivision • CPU intensive • But we might as well use the CPU for something! • View dependent • Requires re-tessellation for shadow map passes • Mesh topology changes from frame to frame • Makes motion blur difficult

  37. Ambient Occlusion Shading • Helps simulate the global illumination “look” of the original movie • Self occlusion is the degree to which an object shadows itself • “How much of the sky can I see from this point?” • Simulates a large spherical light surrounding the scene • Popular in production rendering – Pearl Harbor (ILM), Stuart Little 2 (Sony)

  38. Occlusion N

  39. How To Calculate Occlusion • Shoot rays from surface in random directions over the hemisphere (centered around the normal) • The percentage of rays that hit something is the occlusion amount • Can also keep track of average of un-occluded directions – “bent normal” • Some Renderman compliant renders (e.g. Entropy) have a built-in occlusion() function that will do this • We can’t trace rays using graphics hardware (yet) • So we pre-calculate it!

  40. Occlusion Baking Tool • Uses ray-tracing engine to calculate occlusion values for each vertex in control mesh • We used 128 rays / vertex • Stored as floating point scalar for each vertex and each frame of the animation • Calculation took around 5 hours for 1000 frames • Subdivision code interpolates occlusion values using cubic interpolation • Used as ambient term in shader

  41. Ogre Demo • Show demo

  42. Procedural Shading in Time Machine • Goals for the Time Machine demo • Overview of effects • Metallic Paint • Wood • Chrome • Techniques used • Faux-BRDF reflection • Reveal and dXdT maps • Normal and DuDv scaling • Dynamic Bump mapping • Performance Issues • Summary

  43. Why do Time Machine? • GPUs are much more programmable • Thanks to generalized dependent texturing, more active textures (16 on GeForce FX) and (for our purposes) unlimited blend operations, high-quality animation is possible per-pixel • GeForce FX has >2x performance of GeForce 4Ti • Executing lots of per-pixel operations isn’t just possible; it can be done in real time. • Previous per-pixel animation was limited • Animated textures • PDE / CA effects (see Mark Harris’ talk at GDC) • Goal : Full-scene per-pixel animation

  44. Why do Time Machine? (continued) • Neglected pick-up trucks demonstrate a wide variety of surface effects, with intricate transitions and boundaries • Paint oxidizing, bleaching and rusting • Vinyl cracking • Wood splintering and fading • And more… Not possible with just per-vertex animation!

More Related