E N D
1. August 2003 Efficient High-Level Shader Development Natalya Tatarchuk
3D Application Research Group
ATI Technologies, Inc.
2. August 2003 Overview Writing optimal HLSL code
Compiling issues
Optimization strategies
Code structure pointers
HLSL Shader Examples
Multi-layer car paint effect
Translucent Iridescent Shader
berlight Shader
3. August 2003 Why use HLSL? Faster, easier effect development
Instant readability of your shader code
Better code re-use and maintainability
Optimization
Added benefit of HLSL compiler optimizations
Still helps to know whats under the hood
Industry standard which will run on cards from any vendor
Current and future industry direction
Increase your ability to iterate on a given shader design, resulting in better looking games
Conveniently manage shader permutations
4. August 2003 Compile Targets Legal HLSL is still independent of compile target chosen
But having an HLSL shader doesnt mean it will always run on any hardware!
Currently supported compile targets:
vs_1_1, vs_2_0, vs_2_sw
ps_1_1, ps_1_2, ps_1_3, ps_1_4, ps_2_0, ps_2_sw
Compilation is vendor-independent and is done by a D3DX component that Microsoft can update independent of the runtime release schedule
5. August 2003 Compilation Failure The obvious: program errors (bad syntax, etc)
Compile target specific reasons your shader is too complex for the selected target
Not enough resources in the selected target
Uses too many registers (temporaries, for example)
Too many resulting asm instructions for the compile target
Lack of capability in the target
Such as trying to sample a texture in vs_1_1
Using dynamic branching when unsupported in the target
Sampling texture too many times for the target (Example: more than 6 for ps_1_4)
Compiler provides useful messages
6. August 2003 Use Disassembly for Hints Very helpful for understanding relationship between compile targets and code generation
Disassembly output provides valuable hints when compiling down to an older compile target
If successfully compiled for a more recent target (eg. ps_2_0), look at the disassembly output for hints when failing to compile to an older target (eg. ps_1_4)
Check out instruction count for ALU and tex ops
Figure out how HLSL instructions get mapped to assembly Although the HLSL compiler will display the reasons for compilation failure to you, you can also take a look at the disassembled code and examine the resulting assembly to get better understanding of why your compilation failed when you are pushing the limits of a particular compile target. Although the HLSL compiler will display the reasons for compilation failure to you, you can also take a look at the disassembled code and examine the resulting assembly to get better understanding of why your compilation failed when you are pushing the limits of a particular compile target.
7. August 2003 Getting Disassembly Output for Your Shaders Directly use FXC
Compile for any target desired
Compile both individual shader files and full effects
Various input arguments
Allow to turn shader optimizations on / off
Specify different entry points
Enable / disable generating debug information
8. August 2003 Easier Path to Disassembly Use RenderMonkey while developing shaders
See your changes in real-time
Disassembly output is updated every time a shader is compiled
Displays count for ALUand texture ops, as well as the limits forthe selected target
Can save resulting assembly code into text file Instead of going through the hoops of compiling your shaders from HLSL to binary asm through FXC, RenderMonkey integrates that functionality for the convenience of the shader developers.
You also have an option to save out the resulting assembly code into a corresponding vsh and psh file if you wish to ship the asm code rather than your HLSL shader (some developers find that they would like to keep their HLSL shaders hidden away).
Instead of going through the hoops of compiling your shaders from HLSL to binary asm through FXC, RenderMonkey integrates that functionality for the convenience of the shader developers.
You also have an option to save out the resulting assembly code into a corresponding vsh and psh file if you wish to ship the asm code rather than your HLSL shader (some developers find that they would like to keep their HLSL shaders hidden away).
9. August 2003 Optimizing HLSL Shaders Dont forget you are running on a vector processor
Do your computations at the most efficient frequency
Dont do something per-pixel that you can do per-vertex
Dont perform computation in a shader that you can precompute in the app
Use HLSL intrinsic functions
Helps hardware to optimize your shaders
Know your intrinsics and how they map to asm, especially asm modifiers Important objective for high performance shaders: If you are hitting the limits of your pixel shader or just plainly want to improve the speed, if you can get away with doing a computation per-vertex rather than per-pixel, then do so. These types of operations are where the biggest wins often come from.
Here is where I could should an example disassembling a shader with pow using 8 and pow using a generic parameter one should disassemble, the other wont. I can also show an example how HLSL translates normalize() intrinsic using rsqImportant objective for high performance shaders: If you are hitting the limits of your pixel shader or just plainly want to improve the speed, if you can get away with doing a computation per-vertex rather than per-pixel, then do so. These types of operations are where the biggest wins often come from.
Here is where I could should an example disassembling a shader with pow using 8 and pow using a generic parameter one should disassemble, the other wont. I can also show an example how HLSL translates normalize() intrinsic using rsq
10. August 2003 HLSL Syntax Not Limited The HLSL code you write is not limited by the compile target you choose
You can always use loops, subroutines, if-else statements etc
If not natively supported in the selected compile target, the compiler will still try to generate code:
Loops will be unrolled
Subroutines will be inlined
If else statements will execute both branches, selecting appropriate output as the result
Code generation is dependent upon compile target
Use appropriate data types to improve instruction count
Store your data in a vector when needed
However, using appropriate data types helps compiler do better job at optimizing your code The choice of compile target doesnt mean that you cannot use certain language constructs in your shaders.
HLSL compiler always tries to find a way to compile all possible constructs into the desired compile target.
Of course this may not be possible in some cases directly, but the compiler will try to find alternate approaches for generating resulting assembly code.
For example, if a shader writer uses for loops, subroutines, if-else statements for compile targets that do not natively support those, the compiler will unroll the loops, subroutine calls will be inlined, The choice of compile target doesnt mean that you cannot use certain language constructs in your shaders.
HLSL compiler always tries to find a way to compile all possible constructs into the desired compile target.
Of course this may not be possible in some cases directly, but the compiler will try to find alternate approaches for generating resulting assembly code.
For example, if a shader writer uses for loops, subroutines, if-else statements for compile targets that do not natively support those, the compiler will unroll the loops, subroutine calls will be inlined,
11. August 2003 Using If Statement in HLSL Can have large performance implications
Lack of branching support in most asm models
Both sides of an if statement will be executed
The output is chosen based on which side of the if would have been taken
Optimization is different than in the CPU programming world
12. August 2003 Example of Using If in Vs_1_1
13. August 2003 Example of Function Inlining
14. August 2003 Code Permutations Via Compilation
15. August 2003 Scalar and Vector Data Types An important point to note is that the ps_2_0 and lower pixel shader models do not have native support for arbitrary swizzles.
Hence, concise high level code which uses swizzles can result in fairly nasty binary asm when compiling to these targets.
You should familiarize yourself with the native swizzles available in these assembly models.
An important point to note is that the ps_2_0 and lower pixel shader models do not have native support for arbitrary swizzles.
Hence, concise high level code which uses swizzles can result in fairly nasty binary asm when compiling to these targets.
You should familiarize yourself with the native swizzles available in these assembly models.
16. August 2003 Integer Data Type Added to make relative addressing more efficient
Using floats for addressing purposes without defined truncation rules can result in incorrect access to arrays.
All inputs used as ints should be defined as ints in your shader
It is very easy to generate extra instructions by using the int datatype in places that it should not be used.
The int datatype was added to HLSL to make relative addressing familiar as well as efficient.
The problem with using float datatypes for addressing purposes without truncation rules is that incorrect access to arrays can occur.
In order to avoid unwanted rounding or truncation errors during addressing, the int datatype was added It is very easy to generate extra instructions by using the int datatype in places that it should not be used.
The int datatype was added to HLSL to make relative addressing familiar as well as efficient.
The problem with using float datatypes for addressing purposes without truncation rules is that incorrect access to arrays can occur.
In order to avoid unwanted rounding or truncation errors during addressing, the int datatype was added
17. August 2003 Example of Integer Data Type Usage Matrix palette indices for skinning
Declaring variable as an int is a free operation => no truncation occurs
Using a float and casting it to an int or using directly => truncation will happen
18. August 2003 Real-World Shader Examples Will present several case studies of developing shaders used in ATIs demos
Multi-tone car paint effect
Translucent iridescent effect
Classic berlight example
Examples are presented as RenderMonkeyTM workspaces
Distributed publicly with version 1.0 release RenderMonkey allows you to concentrate on writing shaders without getting bogged down in app code
RenderMonkey allows you to concentrate on writing shaders without getting bogged down in app code
19. August 2003 Multi-Tone Car Paint
20. August 2003 Multi-Tone Car Paint Effect Multi-tone base color layer
Microflake layer simulation
Clear gloss coat
Dynamically Blurred Reflections The application of paint to a cars body can be a complicated process. Expensive auto body paint is usually applied in layered stages and often includes dye layers,
clear coat layers, and metallic flakes suspended in enamel. The result of these successive paint layers is a surface that exhibits complex light interactions,
giving the car a smooth, glossy and sparkly finish.
We started working on this demo at the time where the HLSL wasnt even available yet, so we developed our shaders using assembly. The shaders have been designed from the very start to push the limits of performance and they were fast. Later we decided that we want to re-write the shaders using HLSL and this is how we approached it.
and its has been designed to be fast and it was originally written in assembly.
The application of paint to a cars body can be a complicated process. Expensive auto body paint is usually applied in layered stages and often includes dye layers,
clear coat layers, and metallic flakes suspended in enamel. The result of these successive paint layers is a surface that exhibits complex light interactions,
giving the car a smooth, glossy and sparkly finish.
We started working on this demo at the time where the HLSL wasnt even available yet, so we developed our shaders using assembly. The shaders have been designed from the very start to push the limits of performance and they were fast. Later we decided that we want to re-write the shaders using HLSL and this is how we approached it.
and its has been designed to be fast and it was originally written in assembly.
21. August 2003 Car Paint Layers Build Up
22. August 2003 Multi-Tone Base Paint Layer View-dependent lerpingbetween three paintcolors
Normal from appearancepreserving simplificationprocess, N
Uses subtractive tone to control overall color accumulation The car model shown here uses a relatively low number of polygons but employs a high precision normal map generated by an appearance preserving simplification algorithm.The car model shown here uses a relatively low number of polygons but employs a high precision normal map generated by an appearance preserving simplification algorithm.
23. August 2003 Normal Decompression Sample from two-channel 16-16 normal map
Derive z from +sqrt (1 x2 y2)
Gives higher precision than typically used 8-8-8-8 normal map Due to the pixel shader operations performed across the smoothly changing surfaces (such as the hood of the car), a 16-bit per channel normal map is necessary.
Since the normals are stored in surface local coordinates (a.k.a. tangent space), we can assume that the z component of the normals will be positive. Thus, we can store x and y in two channels of a 16-16 texture map and derive z in the pixel shader from +sqrt(1 x2 y2 ). Due to the pixel shader operations performed across the smoothly changing surfaces (such as the hood of the car), a 16-bit per channel normal map is necessary.
Since the normals are stored in surface local coordinates (a.k.a. tangent space), we can assume that the z component of the normals will be positive. Thus, we can store x and y in two channels of a 16-16 texture map and derive z in the pixel shader from +sqrt(1 x2 y2 ).
24. August 2003 Multi-Tone Base Coat Vertex Shader
25. August 2003 Multi-Tone Base Coat Pixel Shader
26. August 2003 Microflake Layer In this portion of the shader we simulate the appearance of metallic flakes suspended in enamel.
In this portion of the shader we simulate the appearance of metallic flakes suspended in enamel.
27. August 2003 Microflake Deposit Layer
28. August 2003 Computing Microflake Layer Normals Start out by using normal vector fetched from the normal map, N
Using the high frequency noise map, compute perturbed normal Np
Simulate two layers of microflake deposits by computing perturbed normals Np1 and Np2
29. August 2003 Microflake Layer Vertex Shader VS_OUTPUT main(float4 Pos: POSITION, float3 Normal: NORMAL, float2 Tex: TEXCOORD0, float3 Tangent: TANGENT, float3 Binormal: BINORMAL )
{
VS_OUTPUT Out = (VS_OUTPUT) 0;
// Propagate transformed position out:
Out.Pos = mul( view_proj_matrix, Pos );
// Compute view vector:
Out.View = normalize(mul(inv_view_matrix, float4(0, 0, 0, 1))- Pos);
// Propagate texture coordinates:
Out.Tex = Tex;
// Propagate tangent, binormal, and normal vectors to pixel
// shader:
Out.Normal = Normal;
Out.Tangent = Tangent;
Out.Binormal = Binormal;
// Compute microflake tiling factor:
Out.SparkleTex = float4( Tex * fFlakeTilingFactor, 0, 1 );
return Out;
} Possibly get rid of this slide I dont remember why we were changing the texture coords herePossibly get rid of this slide I dont remember why we were changing the texture coords here
30. August 2003 Microflake Layer Pixel Shader float4 main(float4 Diff: COLOR0, float2 Tex : TEXCOORD0, float3 Tangent: TEXCOORD1, float3 Binormal: TEXCOORD2, float3 Normal: TEXCOORD3, float3 View: TEXCOORD4, float3 SparkleTex : TEXCOORD5 ) : COLOR
{
fetch and signed scale the normal fetched from the normal map
float3 vFlakesNormal = 2 * tex2D( microflakeNMap, SparkleTex ) - 1;
float3 vNp1 = microflakePerturbationA * vFlakesNormal + normalPerturbation * vNormal ;
float3 vNp2 = microflakePerturbation * ( vFlakesNormal + vNormal ) ;
float3 vView = normalize( View );
float3x3 mTangentToWorld = transpose( float3x3( Tangent, Binormal, Normal ));
float3 vNp1World = normalize( mul( mTangentToWorld, vNp1) );
float fFresnel1 = saturate( dot( vNp1World, vView ));
float3 vNp2World = normalize( mul( mTangentToWorld, vNp2 ));
float fFresnel2 = saturate( dot( vNp2World, vView ));
float fFresnel1Sq = fFresnel1 * fFresnel1;
float4 paintColor = fFresnel1 * flakeColor + fFresnel1Sq * flakeColor + fFresnel1Sq * fFresnel1Sq * flakeColor + pow( fFresnel2, 16 ) * flakeColor;
return float4( paintColor, 1.0 );
} Microflakes normal map is a high frequency normalized vector noise map which is repeated across all surface.
Fetching the value from it for each pixel allows us to compute perturbed normal for the surface to simulate
appearance of microflakes suspected in the coat of paint
This shader simulates two layers of microflakes suspended in the coat of paint.
To compute the surface normal for the first layer, the following formula is used:
Microflakes normal map is a high frequency normalized vector noise map which is repeated across all surface.
Fetching the value from it for each pixel allows us to compute perturbed normal for the surface to simulate
appearance of microflakes suspected in the coat of paint
This shader simulates two layers of microflakes suspended in the coat of paint.
To compute the surface normal for the first layer, the following formula is used:
31. August 2003 Clear Gloss Coat
32. August 2003 RGBScale HDR Environment Map Alpha channel contains 1/16 of the true HDR scale of the pixel value
RGB contains normalized color of the pixel
Pixel shader reconstructs HDR value from scale*8*color to get half of the true HDR value
Obvious quantization issues, but reasonable for some applications
Similar to Wards RGBE Real Pixels but simpler to reconstruct in the pixel shader One interesting aspect of the clear coat term is the decision to store the environment map in an RGBScale form to simulate high dynamic range in a low memory footprint. The alpha channel of the texture, shown on the right in figure 4, represents 1/16th of the true range of the data while the RGB, shown on the left, represents the normalized color. In the pixel shader, the alpha channel and RGB channels are multiplied together and multiplied by eight to reconstruct a cheap form of HDR reflectance. This is multiplied by a subtle Fresnel term before being added to the lighting terms described above.
One interesting aspect of the clear coat term is the decision to store the environment map in an RGBScale form to simulate high dynamic range in a low memory footprint. The alpha channel of the texture, shown on the right in figure 4, represents 1/16th of the true range of the data while the RGB, shown on the left, represents the normalized color. In the pixel shader, the alpha channel and RGB channels are multiplied together and multiplied by eight to reconstruct a cheap form of HDR reflectance. This is multiplied by a subtle Fresnel term before being added to the lighting terms described above.
33. August 2003 Environment Map
34. August 2003 Dynamically Blurred Reflections
35. August 2003 Dynamic Blurring of Environment Map Reflections A gloss map can be supplied to specify the regions where reflections can be blurred
Use bias when sampling the environment map to vary blurriness of the resulting reflections
Use texCUBEbias for to access the cubic environment map
For rough specular, the bias is high, causing a blurring effect
Can also convert color fetched from environment map to luminance in rough trim areas
36. August 2003 Clear Gloss Coat Pixel Shader float4 ps_main( ... /* same inputs as in the previous shader */ )
{
// ... use normal in world space (see Multi-tone pixel shader)
// Compute reflection vector:
float fFresnel = saturate(dot( vNormalWorld, vView));
float3 vReflection = 2 * vNormalWorld * fFresnel - vView;
float fEnvBias = glossLevel;
// Sample environment map using this reflection vector and bias:
float4 envMap = texCUBEbias( showroomMap, float4( vReflection, fEnvBias ) );
// Premultiply by alpha:
envMap.rgb = envMap.rgb * envMap.a;
// Brighten the environment map sampling result:
envMap.rgb *= brightnessFactor;
// Combine result of environment map reflection with the paint // color:
float fEnvContribution = 1.0 - 0.5 * fFresnel;
return float4( envMap.rgb * fEnvContribution, 1.0 );
} // Here we just use a constant gloss value to bias reading from the environment
// map, however, in the real demo we use a gloss map which specifies which
// regions will have reflection slightly blurred.
// Here we just use a constant gloss value to bias reading from the environment
// map, however, in the real demo we use a gloss map which specifies which
// regions will have reflection slightly blurred.
37. August 2003 Compositing Multi-Tone Base Layer and Microflake Layer Base color and flake effect are derived from Np1 and Np2 using the following polynomial:
color0(Np1V) + color1(Np1V)2 + color2(Np1V)4 + color3(Np2V)16
38. August 2003 Compositing Final Look
39. August 2003 Original Hand-Tuned Assembly
40. August 2003 Car Paint Shader HLSL Compiler Disassembly Output
41. August 2003 Full Result of Multi-Layer Paint
42. August 2003 Translucent Iridescent Shader: Butterfly Wings PERHAPS A BETTER SCREEN SHOT WITH THE BUTTERFLY BODY INCLUDEDPERHAPS A BETTER SCREEN SHOT WITH THE BUTTERFLY BODY INCLUDED
43. August 2003 Translucent Iridescent Shader: Butterfly Wings Simulates translucency of delicate butterfly wings
Wings glow from scattered reflected light
Similar to the effect of softly backlit rice paper
Displays subtle iridescent lighting
Similar to rainbow pattern on the surface of soap bubbles
Caused by the interference of light waves resulting from multiple reflections of light off of surfaces of varying thickness
Combines gloss, opacity and normal maps for a multi-layered final look
Gloss map contributes to satiny highlights
Opacity map allows portions of wings to be transparent
Normal map is used to give wings a bump-mapped look Translucency is defined as a material that allows light to pass through yet it isnt transparent. It receives light and can be luminous only from an outside source.
If you hold a sheet of paper in front of a light source, you can see that the light makes it glow, yet you cannot see the light source through the paper because the paper scatters the light.
Iridescence , which can be detected as a rainbow pattern on the surface of soap bubbles and gasoline spills, is the effect caused by the interference of light waves resulting from multiple reflections of light off of surfaces of varying thickness. Mother-of-pearl, a compact disc share this quality with the wings of some butterflies, for example, Morpho butterfly wings emit a brilliant blue color while other colors are obsorbed. Translucency is defined as a material that allows light to pass through yet it isnt transparent. It receives light and can be luminous only from an outside source.
If you hold a sheet of paper in front of a light source, you can see that the light makes it glow, yet you cannot see the light source through the paper because the paper scatters the light.
Iridescence , which can be detected as a rainbow pattern on the surface of soap bubbles and gasoline spills, is the effect caused by the interference of light waves resulting from multiple reflections of light off of surfaces of varying thickness. Mother-of-pearl, a compact disc share this quality with the wings of some butterflies, for example, Morpho butterfly wings emit a brilliant blue color while other colors are obsorbed.
44. August 2003 RenderMonkey Butterfly Wings Shader Example Parameters that contribute to the translucency and iridescence look:
Light position and scene ambient color
Translucency coefficient
Gloss scale and bias
Scale and bias for speed of iridescence change
Workspace:Iridescent Butterfly.rfx
45. August 2003 Translucent Iridescent Shader: Vertex Shader ..
// Propagate input texture coordinates:
Out.Tex = Tex;
// Define tangent space matrix:
float3x3 mTangentSpace;
mTangentSpace[0] = Tangent;
mTangentSpace[1] = Binormal;
mTangentSpace[2] = Normal;
// Compute the light vector (object space):
float3 vLight = normalize( mul( inv_view_matrix, lightPos ) - Pos );
// Output light vector in tangent space:
Out.Light = mul( mTangentSpace, vLight );
// Compute the view vector (object space):
float3 vView = normalize( mul( inv_view_matrix, float4(0,0,0,1)) - Pos );
// Output view vector in tangent space:
Out.View = mul( mTangentSpace, vView );
// Compute the half angle vector (in tangent space):
Out.Half = mul( mTangentSpace, normalize( vView + vLight ) );
return Out;
46. August 2003 Translucent Iridescent Shader: Loading Information
47. August 2003 Diffuse Illumination For Translucency
48. August 2003 Adding Opacity to ButterlyWings Resulted color is modulated by the opacity value to add
transparency to the wings: Normally when you want to blend something thats transparent, you would just do it in your alpha blending stage. But if its specular, you dont want before you apply the specular highlights. One way to do it properly would be to multipass do one diffuse pass and one specular additive pass but this is an approach to do it in a single pass).Normally when you want to blend something thats transparent, you would just do it in your alpha blending stage. But if its specular, you dont want before you apply the specular highlights. One way to do it properly would be to multipass do one diffuse pass and one specular additive pass but this is an approach to do it in a single pass).
49. August 2003 Making Butterfly Wings Iridescent
50. August 2003 Assembling Final Color
51. August 2003 HLSL Disassembly Comparison
52. August 2003 Example of Translucent Iridescent Shader
53. August 2003 Optimization Study: berlight Flexible light described in JGT article Lighting Controls for Computer Cinematography by Ronen Barzel of Pixar
berlight is procedural and has many controls:
light type, intensity, light color, cuton, cutoff, near edge, far edge, falloff, falloff distance, max intensity, parallel rays, shearx, sheary, width, height, width edge, height edge, roundness and beam distribution
Code here is based upon the public domain RenderMan implementation by Larry Gritz JGT == Journal of Graphics ToolsJGT == Journal of Graphics Tools
54. August 2003 berlight Spotlight Mode Spotlight mode defines a procedural volume with smooth boundaries
Shape of spotlight is made up of two nested superellipses which are swept along direction of light
Also has smooth cuton and cutoff planes
Can tune parameters to get all sorts of looks
55. August 2003 berlight Spotlight Volume Cuton and cutoff planes are left out for this diagramCuton and cutoff planes are left out for this diagram
56. August 2003 berlight Spotlight Volume Cuton and cutoff planes are left out for this diagramCuton and cutoff planes are left out for this diagram
57. August 2003 Original clipSuperellipse() routine This is a key subroutine in the uberlight shader. It computes attenuation as a function of a points position in the swept superellipses. 1 inside inner ellipse. 0 outside outer ellipse. Smoothstep in between.
This is a key subroutine in the uberlight shader. It computes attenuation as a function of a points position in the swept superellipses. 1 inside inner ellipse. 0 outside outer ellipse. Smoothstep in between.
58. August 2003 Vectorized Version The R3x0 cycles are less due to the ability to do coissue as well as some other secret sauce we arent telling about.The R3x0 cycles are less due to the ability to do coissue as well as some other secret sauce we arent telling about.
59. August 2003 smoothstep() function Standard function in procedural shading
Intrinsics built into RenderMan and DirectX HLSL:
60. August 2003 C implementation
61. August 2003 HLSL implementation The free saturate handles x outside of [edge0..edge1] range Know how to use saturate to do this kind of thresholding for youKnow how to use saturate to do this kind of thresholding for you
62. August 2003 Vectorized HLSL Implementation Operation performed on float3s to compute three different smoothstep operations in parallel This multiplication can be done as a vector operation while rcp is defined to be a scalar operation and hence would have broken the vector nature of this routine.
OneOverWidth is computed outside of the shader for two of the three smoothsteps in uberlight, so this optimization is a win.
This multiplication can be done as a vector operation while rcp is defined to be a scalar operation and hence would have broken the vector nature of this routine.
OneOverWidth is computed outside of the shader for two of the three smoothsteps in uberlight, so this optimization is a win.
63. August 2003 Summary Writing optimal HLSL code
Compiling issues
Optimization strategies
Code structure pointers
Shader Examples
Shipped with RenderMonkey version 1.0see www.ati.com/developer