1 / 63

Efficient High-Level Shader Development

hank
Download Presentation

Efficient High-Level Shader Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. August 2003 Efficient High-Level Shader Development Natalya Tatarchuk 3D Application Research Group ATI Technologies, Inc.

    2. August 2003 Overview Writing optimal HLSL code Compiling issues Optimization strategies Code structure pointers HLSL Shader Examples Multi-layer car paint effect Translucent Iridescent Shader berlight Shader

    3. August 2003 Why use HLSL? Faster, easier effect development Instant readability of your shader code Better code re-use and maintainability Optimization Added benefit of HLSL compiler optimizations Still helps to know whats under the hood Industry standard which will run on cards from any vendor Current and future industry direction Increase your ability to iterate on a given shader design, resulting in better looking games Conveniently manage shader permutations

    4. August 2003 Compile Targets Legal HLSL is still independent of compile target chosen But having an HLSL shader doesnt mean it will always run on any hardware! Currently supported compile targets: vs_1_1, vs_2_0, vs_2_sw ps_1_1, ps_1_2, ps_1_3, ps_1_4, ps_2_0, ps_2_sw Compilation is vendor-independent and is done by a D3DX component that Microsoft can update independent of the runtime release schedule

    5. August 2003 Compilation Failure The obvious: program errors (bad syntax, etc) Compile target specific reasons your shader is too complex for the selected target Not enough resources in the selected target Uses too many registers (temporaries, for example) Too many resulting asm instructions for the compile target Lack of capability in the target Such as trying to sample a texture in vs_1_1 Using dynamic branching when unsupported in the target Sampling texture too many times for the target (Example: more than 6 for ps_1_4) Compiler provides useful messages

    6. August 2003 Use Disassembly for Hints Very helpful for understanding relationship between compile targets and code generation Disassembly output provides valuable hints when compiling down to an older compile target If successfully compiled for a more recent target (eg. ps_2_0), look at the disassembly output for hints when failing to compile to an older target (eg. ps_1_4) Check out instruction count for ALU and tex ops Figure out how HLSL instructions get mapped to assembly Although the HLSL compiler will display the reasons for compilation failure to you, you can also take a look at the disassembled code and examine the resulting assembly to get better understanding of why your compilation failed when you are pushing the limits of a particular compile target. Although the HLSL compiler will display the reasons for compilation failure to you, you can also take a look at the disassembled code and examine the resulting assembly to get better understanding of why your compilation failed when you are pushing the limits of a particular compile target.

    7. August 2003 Getting Disassembly Output for Your Shaders Directly use FXC Compile for any target desired Compile both individual shader files and full effects Various input arguments Allow to turn shader optimizations on / off Specify different entry points Enable / disable generating debug information

    8. August 2003 Easier Path to Disassembly Use RenderMonkey while developing shaders See your changes in real-time Disassembly output is updated every time a shader is compiled Displays count for ALU and texture ops, as well as the limits for the selected target Can save resulting assembly code into text file Instead of going through the hoops of compiling your shaders from HLSL to binary asm through FXC, RenderMonkey integrates that functionality for the convenience of the shader developers. You also have an option to save out the resulting assembly code into a corresponding vsh and psh file if you wish to ship the asm code rather than your HLSL shader (some developers find that they would like to keep their HLSL shaders hidden away). Instead of going through the hoops of compiling your shaders from HLSL to binary asm through FXC, RenderMonkey integrates that functionality for the convenience of the shader developers. You also have an option to save out the resulting assembly code into a corresponding vsh and psh file if you wish to ship the asm code rather than your HLSL shader (some developers find that they would like to keep their HLSL shaders hidden away).

    9. August 2003 Optimizing HLSL Shaders Dont forget you are running on a vector processor Do your computations at the most efficient frequency Dont do something per-pixel that you can do per-vertex Dont perform computation in a shader that you can precompute in the app Use HLSL intrinsic functions Helps hardware to optimize your shaders Know your intrinsics and how they map to asm, especially asm modifiers Important objective for high performance shaders: If you are hitting the limits of your pixel shader or just plainly want to improve the speed, if you can get away with doing a computation per-vertex rather than per-pixel, then do so. These types of operations are where the biggest wins often come from. Here is where I could should an example disassembling a shader with pow using 8 and pow using a generic parameter one should disassemble, the other wont. I can also show an example how HLSL translates normalize() intrinsic using rsqImportant objective for high performance shaders: If you are hitting the limits of your pixel shader or just plainly want to improve the speed, if you can get away with doing a computation per-vertex rather than per-pixel, then do so. These types of operations are where the biggest wins often come from. Here is where I could should an example disassembling a shader with pow using 8 and pow using a generic parameter one should disassemble, the other wont. I can also show an example how HLSL translates normalize() intrinsic using rsq

    10. August 2003 HLSL Syntax Not Limited The HLSL code you write is not limited by the compile target you choose You can always use loops, subroutines, if-else statements etc If not natively supported in the selected compile target, the compiler will still try to generate code: Loops will be unrolled Subroutines will be inlined If else statements will execute both branches, selecting appropriate output as the result Code generation is dependent upon compile target Use appropriate data types to improve instruction count Store your data in a vector when needed However, using appropriate data types helps compiler do better job at optimizing your code The choice of compile target doesnt mean that you cannot use certain language constructs in your shaders. HLSL compiler always tries to find a way to compile all possible constructs into the desired compile target. Of course this may not be possible in some cases directly, but the compiler will try to find alternate approaches for generating resulting assembly code. For example, if a shader writer uses for loops, subroutines, if-else statements for compile targets that do not natively support those, the compiler will unroll the loops, subroutine calls will be inlined, The choice of compile target doesnt mean that you cannot use certain language constructs in your shaders. HLSL compiler always tries to find a way to compile all possible constructs into the desired compile target. Of course this may not be possible in some cases directly, but the compiler will try to find alternate approaches for generating resulting assembly code. For example, if a shader writer uses for loops, subroutines, if-else statements for compile targets that do not natively support those, the compiler will unroll the loops, subroutine calls will be inlined,

    11. August 2003 Using If Statement in HLSL Can have large performance implications Lack of branching support in most asm models Both sides of an if statement will be executed The output is chosen based on which side of the if would have been taken Optimization is different than in the CPU programming world

    12. August 2003 Example of Using If in Vs_1_1

    13. August 2003 Example of Function Inlining

    14. August 2003 Code Permutations Via Compilation

    15. August 2003 Scalar and Vector Data Types An important point to note is that the ps_2_0 and lower pixel shader models do not have native support for arbitrary swizzles. Hence, concise high level code which uses swizzles can result in fairly nasty binary asm when compiling to these targets. You should familiarize yourself with the native swizzles available in these assembly models. An important point to note is that the ps_2_0 and lower pixel shader models do not have native support for arbitrary swizzles. Hence, concise high level code which uses swizzles can result in fairly nasty binary asm when compiling to these targets. You should familiarize yourself with the native swizzles available in these assembly models.

    16. August 2003 Integer Data Type Added to make relative addressing more efficient Using floats for addressing purposes without defined truncation rules can result in incorrect access to arrays. All inputs used as ints should be defined as ints in your shader It is very easy to generate extra instructions by using the int datatype in places that it should not be used. The int datatype was added to HLSL to make relative addressing familiar as well as efficient. The problem with using float datatypes for addressing purposes without truncation rules is that incorrect access to arrays can occur. In order to avoid unwanted rounding or truncation errors during addressing, the int datatype was added It is very easy to generate extra instructions by using the int datatype in places that it should not be used. The int datatype was added to HLSL to make relative addressing familiar as well as efficient. The problem with using float datatypes for addressing purposes without truncation rules is that incorrect access to arrays can occur. In order to avoid unwanted rounding or truncation errors during addressing, the int datatype was added

    17. August 2003 Example of Integer Data Type Usage Matrix palette indices for skinning Declaring variable as an int is a free operation => no truncation occurs Using a float and casting it to an int or using directly => truncation will happen

    18. August 2003 Real-World Shader Examples Will present several case studies of developing shaders used in ATIs demos Multi-tone car paint effect Translucent iridescent effect Classic berlight example Examples are presented as RenderMonkeyTM workspaces Distributed publicly with version 1.0 release RenderMonkey allows you to concentrate on writing shaders without getting bogged down in app code RenderMonkey allows you to concentrate on writing shaders without getting bogged down in app code

    19. August 2003 Multi-Tone Car Paint

    20. August 2003 Multi-Tone Car Paint Effect Multi-tone base color layer Microflake layer simulation Clear gloss coat Dynamically Blurred Reflections The application of paint to a cars body can be a complicated process. Expensive auto body paint is usually applied in layered stages and often includes dye layers, clear coat layers, and metallic flakes suspended in enamel. The result of these successive paint layers is a surface that exhibits complex light interactions, giving the car a smooth, glossy and sparkly finish. We started working on this demo at the time where the HLSL wasnt even available yet, so we developed our shaders using assembly. The shaders have been designed from the very start to push the limits of performance and they were fast. Later we decided that we want to re-write the shaders using HLSL and this is how we approached it. and its has been designed to be fast and it was originally written in assembly. The application of paint to a cars body can be a complicated process. Expensive auto body paint is usually applied in layered stages and often includes dye layers, clear coat layers, and metallic flakes suspended in enamel. The result of these successive paint layers is a surface that exhibits complex light interactions, giving the car a smooth, glossy and sparkly finish. We started working on this demo at the time where the HLSL wasnt even available yet, so we developed our shaders using assembly. The shaders have been designed from the very start to push the limits of performance and they were fast. Later we decided that we want to re-write the shaders using HLSL and this is how we approached it. and its has been designed to be fast and it was originally written in assembly.

    21. August 2003 Car Paint Layers Build Up

    22. August 2003 Multi-Tone Base Paint Layer View-dependent lerping between three paint colors Normal from appearance preserving simplification process, N Uses subtractive tone to control overall color accumulation The car model shown here uses a relatively low number of polygons but employs a high precision normal map generated by an appearance preserving simplification algorithm.The car model shown here uses a relatively low number of polygons but employs a high precision normal map generated by an appearance preserving simplification algorithm.

    23. August 2003 Normal Decompression Sample from two-channel 16-16 normal map Derive z from +sqrt (1 x2 y2) Gives higher precision than typically used 8-8-8-8 normal map Due to the pixel shader operations performed across the smoothly changing surfaces (such as the hood of the car), a 16-bit per channel normal map is necessary. Since the normals are stored in surface local coordinates (a.k.a. tangent space), we can assume that the z component of the normals will be positive. Thus, we can store x and y in two channels of a 16-16 texture map and derive z in the pixel shader from +sqrt(1 x2 y2 ). Due to the pixel shader operations performed across the smoothly changing surfaces (such as the hood of the car), a 16-bit per channel normal map is necessary. Since the normals are stored in surface local coordinates (a.k.a. tangent space), we can assume that the z component of the normals will be positive. Thus, we can store x and y in two channels of a 16-16 texture map and derive z in the pixel shader from +sqrt(1 x2 y2 ).

    24. August 2003 Multi-Tone Base Coat Vertex Shader

    25. August 2003 Multi-Tone Base Coat Pixel Shader

    26. August 2003 Microflake Layer In this portion of the shader we simulate the appearance of metallic flakes suspended in enamel. In this portion of the shader we simulate the appearance of metallic flakes suspended in enamel.

    27. August 2003 Microflake Deposit Layer

    28. August 2003 Computing Microflake Layer Normals Start out by using normal vector fetched from the normal map, N Using the high frequency noise map, compute perturbed normal Np Simulate two layers of microflake deposits by computing perturbed normals Np1 and Np2

    29. August 2003 Microflake Layer Vertex Shader VS_OUTPUT main(float4 Pos: POSITION, float3 Normal: NORMAL, float2 Tex: TEXCOORD0, float3 Tangent: TANGENT, float3 Binormal: BINORMAL ) { VS_OUTPUT Out = (VS_OUTPUT) 0; // Propagate transformed position out: Out.Pos = mul( view_proj_matrix, Pos ); // Compute view vector: Out.View = normalize(mul(inv_view_matrix, float4(0, 0, 0, 1))- Pos); // Propagate texture coordinates: Out.Tex = Tex; // Propagate tangent, binormal, and normal vectors to pixel // shader: Out.Normal = Normal; Out.Tangent = Tangent; Out.Binormal = Binormal; // Compute microflake tiling factor: Out.SparkleTex = float4( Tex * fFlakeTilingFactor, 0, 1 ); return Out; } Possibly get rid of this slide I dont remember why we were changing the texture coords herePossibly get rid of this slide I dont remember why we were changing the texture coords here

    30. August 2003 Microflake Layer Pixel Shader float4 main(float4 Diff: COLOR0, float2 Tex : TEXCOORD0, float3 Tangent: TEXCOORD1, float3 Binormal: TEXCOORD2, float3 Normal: TEXCOORD3, float3 View: TEXCOORD4, float3 SparkleTex : TEXCOORD5 ) : COLOR { fetch and signed scale the normal fetched from the normal map float3 vFlakesNormal = 2 * tex2D( microflakeNMap, SparkleTex ) - 1; float3 vNp1 = microflakePerturbationA * vFlakesNormal + normalPerturbation * vNormal ; float3 vNp2 = microflakePerturbation * ( vFlakesNormal + vNormal ) ; float3 vView = normalize( View ); float3x3 mTangentToWorld = transpose( float3x3( Tangent, Binormal, Normal )); float3 vNp1World = normalize( mul( mTangentToWorld, vNp1) ); float fFresnel1 = saturate( dot( vNp1World, vView )); float3 vNp2World = normalize( mul( mTangentToWorld, vNp2 )); float fFresnel2 = saturate( dot( vNp2World, vView )); float fFresnel1Sq = fFresnel1 * fFresnel1; float4 paintColor = fFresnel1 * flakeColor + fFresnel1Sq * flakeColor + fFresnel1Sq * fFresnel1Sq * flakeColor + pow( fFresnel2, 16 ) * flakeColor; return float4( paintColor, 1.0 ); } Microflakes normal map is a high frequency normalized vector noise map which is repeated across all surface. Fetching the value from it for each pixel allows us to compute perturbed normal for the surface to simulate appearance of microflakes suspected in the coat of paint This shader simulates two layers of microflakes suspended in the coat of paint. To compute the surface normal for the first layer, the following formula is used: Microflakes normal map is a high frequency normalized vector noise map which is repeated across all surface. Fetching the value from it for each pixel allows us to compute perturbed normal for the surface to simulate appearance of microflakes suspected in the coat of paint This shader simulates two layers of microflakes suspended in the coat of paint. To compute the surface normal for the first layer, the following formula is used:

    31. August 2003 Clear Gloss Coat

    32. August 2003 RGBScale HDR Environment Map Alpha channel contains 1/16 of the true HDR scale of the pixel value RGB contains normalized color of the pixel Pixel shader reconstructs HDR value from scale*8*color to get half of the true HDR value Obvious quantization issues, but reasonable for some applications Similar to Wards RGBE Real Pixels but simpler to reconstruct in the pixel shader One interesting aspect of the clear coat term is the decision to store the environment map in an RGBScale form to simulate high dynamic range in a low memory footprint. The alpha channel of the texture, shown on the right in figure 4, represents 1/16th of the true range of the data while the RGB, shown on the left, represents the normalized color. In the pixel shader, the alpha channel and RGB channels are multiplied together and multiplied by eight to reconstruct a cheap form of HDR reflectance. This is multiplied by a subtle Fresnel term before being added to the lighting terms described above. One interesting aspect of the clear coat term is the decision to store the environment map in an RGBScale form to simulate high dynamic range in a low memory footprint. The alpha channel of the texture, shown on the right in figure 4, represents 1/16th of the true range of the data while the RGB, shown on the left, represents the normalized color. In the pixel shader, the alpha channel and RGB channels are multiplied together and multiplied by eight to reconstruct a cheap form of HDR reflectance. This is multiplied by a subtle Fresnel term before being added to the lighting terms described above.

    33. August 2003 Environment Map

    34. August 2003 Dynamically Blurred Reflections

    35. August 2003 Dynamic Blurring of Environment Map Reflections A gloss map can be supplied to specify the regions where reflections can be blurred Use bias when sampling the environment map to vary blurriness of the resulting reflections Use texCUBEbias for to access the cubic environment map For rough specular, the bias is high, causing a blurring effect Can also convert color fetched from environment map to luminance in rough trim areas

    36. August 2003 Clear Gloss Coat Pixel Shader float4 ps_main( ... /* same inputs as in the previous shader */ ) { // ... use normal in world space (see Multi-tone pixel shader) // Compute reflection vector: float fFresnel = saturate(dot( vNormalWorld, vView)); float3 vReflection = 2 * vNormalWorld * fFresnel - vView; float fEnvBias = glossLevel; // Sample environment map using this reflection vector and bias: float4 envMap = texCUBEbias( showroomMap, float4( vReflection, fEnvBias ) ); // Premultiply by alpha: envMap.rgb = envMap.rgb * envMap.a; // Brighten the environment map sampling result: envMap.rgb *= brightnessFactor; // Combine result of environment map reflection with the paint // color: float fEnvContribution = 1.0 - 0.5 * fFresnel; return float4( envMap.rgb * fEnvContribution, 1.0 ); } // Here we just use a constant gloss value to bias reading from the environment // map, however, in the real demo we use a gloss map which specifies which // regions will have reflection slightly blurred. // Here we just use a constant gloss value to bias reading from the environment // map, however, in the real demo we use a gloss map which specifies which // regions will have reflection slightly blurred.

    37. August 2003 Compositing Multi-Tone Base Layer and Microflake Layer Base color and flake effect are derived from Np1 and Np2 using the following polynomial: color0(Np1V) + color1(Np1V)2 + color2(Np1V)4 + color3(Np2V)16

    38. August 2003 Compositing Final Look

    39. August 2003 Original Hand-Tuned Assembly

    40. August 2003 Car Paint Shader HLSL Compiler Disassembly Output

    41. August 2003 Full Result of Multi-Layer Paint

    42. August 2003 Translucent Iridescent Shader: Butterfly Wings PERHAPS A BETTER SCREEN SHOT WITH THE BUTTERFLY BODY INCLUDEDPERHAPS A BETTER SCREEN SHOT WITH THE BUTTERFLY BODY INCLUDED

    43. August 2003 Translucent Iridescent Shader: Butterfly Wings Simulates translucency of delicate butterfly wings Wings glow from scattered reflected light Similar to the effect of softly backlit rice paper Displays subtle iridescent lighting Similar to rainbow pattern on the surface of soap bubbles Caused by the interference of light waves resulting from multiple reflections of light off of surfaces of varying thickness Combines gloss, opacity and normal maps for a multi-layered final look Gloss map contributes to satiny highlights Opacity map allows portions of wings to be transparent Normal map is used to give wings a bump-mapped look Translucency is defined as a material that allows light to pass through yet it isnt transparent. It receives light and can be luminous only from an outside source. If you hold a sheet of paper in front of a light source, you can see that the light makes it glow, yet you cannot see the light source through the paper because the paper scatters the light. Iridescence , which can be detected as a rainbow pattern on the surface of soap bubbles and gasoline spills, is the effect caused by the interference of light waves resulting from multiple reflections of light off of surfaces of varying thickness. Mother-of-pearl, a compact disc share this quality with the wings of some butterflies, for example, Morpho butterfly wings emit a brilliant blue color while other colors are obsorbed. Translucency is defined as a material that allows light to pass through yet it isnt transparent. It receives light and can be luminous only from an outside source. If you hold a sheet of paper in front of a light source, you can see that the light makes it glow, yet you cannot see the light source through the paper because the paper scatters the light. Iridescence , which can be detected as a rainbow pattern on the surface of soap bubbles and gasoline spills, is the effect caused by the interference of light waves resulting from multiple reflections of light off of surfaces of varying thickness. Mother-of-pearl, a compact disc share this quality with the wings of some butterflies, for example, Morpho butterfly wings emit a brilliant blue color while other colors are obsorbed.

    44. August 2003 RenderMonkey Butterfly Wings Shader Example Parameters that contribute to the translucency and iridescence look: Light position and scene ambient color Translucency coefficient Gloss scale and bias Scale and bias for speed of iridescence change Workspace: Iridescent Butterfly.rfx

    45. August 2003 Translucent Iridescent Shader: Vertex Shader .. // Propagate input texture coordinates: Out.Tex = Tex; // Define tangent space matrix: float3x3 mTangentSpace; mTangentSpace[0] = Tangent; mTangentSpace[1] = Binormal; mTangentSpace[2] = Normal; // Compute the light vector (object space): float3 vLight = normalize( mul( inv_view_matrix, lightPos ) - Pos ); // Output light vector in tangent space: Out.Light = mul( mTangentSpace, vLight ); // Compute the view vector (object space): float3 vView = normalize( mul( inv_view_matrix, float4(0,0,0,1)) - Pos ); // Output view vector in tangent space: Out.View = mul( mTangentSpace, vView ); // Compute the half angle vector (in tangent space): Out.Half = mul( mTangentSpace, normalize( vView + vLight ) ); return Out;

    46. August 2003 Translucent Iridescent Shader: Loading Information

    47. August 2003 Diffuse Illumination For Translucency

    48. August 2003 Adding Opacity to Butterly Wings Resulted color is modulated by the opacity value to add transparency to the wings: Normally when you want to blend something thats transparent, you would just do it in your alpha blending stage. But if its specular, you dont want before you apply the specular highlights. One way to do it properly would be to multipass do one diffuse pass and one specular additive pass but this is an approach to do it in a single pass).Normally when you want to blend something thats transparent, you would just do it in your alpha blending stage. But if its specular, you dont want before you apply the specular highlights. One way to do it properly would be to multipass do one diffuse pass and one specular additive pass but this is an approach to do it in a single pass).

    49. August 2003 Making Butterfly Wings Iridescent

    50. August 2003 Assembling Final Color

    51. August 2003 HLSL Disassembly Comparison

    52. August 2003 Example of Translucent Iridescent Shader

    53. August 2003 Optimization Study: berlight Flexible light described in JGT article Lighting Controls for Computer Cinematography by Ronen Barzel of Pixar berlight is procedural and has many controls: light type, intensity, light color, cuton, cutoff, near edge, far edge, falloff, falloff distance, max intensity, parallel rays, shearx, sheary, width, height, width edge, height edge, roundness and beam distribution Code here is based upon the public domain RenderMan implementation by Larry Gritz JGT == Journal of Graphics ToolsJGT == Journal of Graphics Tools

    54. August 2003 berlight Spotlight Mode Spotlight mode defines a procedural volume with smooth boundaries Shape of spotlight is made up of two nested superellipses which are swept along direction of light Also has smooth cuton and cutoff planes Can tune parameters to get all sorts of looks

    55. August 2003 berlight Spotlight Volume Cuton and cutoff planes are left out for this diagramCuton and cutoff planes are left out for this diagram

    56. August 2003 berlight Spotlight Volume Cuton and cutoff planes are left out for this diagramCuton and cutoff planes are left out for this diagram

    57. August 2003 Original clipSuperellipse() routine This is a key subroutine in the uberlight shader. It computes attenuation as a function of a points position in the swept superellipses. 1 inside inner ellipse. 0 outside outer ellipse. Smoothstep in between. This is a key subroutine in the uberlight shader. It computes attenuation as a function of a points position in the swept superellipses. 1 inside inner ellipse. 0 outside outer ellipse. Smoothstep in between.

    58. August 2003 Vectorized Version The R3x0 cycles are less due to the ability to do coissue as well as some other secret sauce we arent telling about.The R3x0 cycles are less due to the ability to do coissue as well as some other secret sauce we arent telling about.

    59. August 2003 smoothstep() function Standard function in procedural shading Intrinsics built into RenderMan and DirectX HLSL:

    60. August 2003 C implementation

    61. August 2003 HLSL implementation The free saturate handles x outside of [edge0..edge1] range Know how to use saturate to do this kind of thresholding for youKnow how to use saturate to do this kind of thresholding for you

    62. August 2003 Vectorized HLSL Implementation Operation performed on float3s to compute three different smoothstep operations in parallel This multiplication can be done as a vector operation while rcp is defined to be a scalar operation and hence would have broken the vector nature of this routine. OneOverWidth is computed outside of the shader for two of the three smoothsteps in uberlight, so this optimization is a win. This multiplication can be done as a vector operation while rcp is defined to be a scalar operation and hence would have broken the vector nature of this routine. OneOverWidth is computed outside of the shader for two of the three smoothsteps in uberlight, so this optimization is a win.

    63. August 2003 Summary Writing optimal HLSL code Compiling issues Optimization strategies Code structure pointers Shader Examples Shipped with RenderMonkey version 1.0 see www.ati.com/developer

More Related