Vertex Shader
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

Topics Covered PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on
  • Presentation posted in: General

Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD. Topics Covered. Overview of the DX11 front-end pipeline Common bottlenecks Advanced Vertex Shader Features Vertex Shader Techniques Samples and Results.

Download Presentation

Topics Covered

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Topics covered

Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill BilodeauDeveloper Technology Engineer, AMD


Topics covered

Topics Covered

  • Overview of the DX11 front-end pipeline

  • Common bottlenecks

  • Advanced Vertex Shader Features

  • Vertex Shader Techniques

  • Samples and Results


Dx11 front end pipeline

DX11 Front-End Pipeline

Input Assembler

CB,

SRV,

or UAV

Vertex Shader

  • VS –vertex data

  • HS – control points

  • Tessellator

  • DS – generated vertices

  • GS – primitives

  • Write to UAV at all stages

    • Starting with DX11.1

Hull Shader

Tessellator

Domain Shader

Geometry Shader

.

.

.

Stream Out

Graphics Hardware


Bottlenecks vs

Bottlenecks - VS

  • VS Attributes

    • Limit outputs to 4 attributes (AMD)

      • This applies to all shader stages (except PS)

  • VS Texture Fetches

    • Too many texture fetches can add latency

      • Especially dependent texture fetches

      • Group fetches together for better performance

      • Hide latency with ALU instructions


Bottlenecks vs1

Bottlenecks - VS

Input Assembler

Pre-VS Cache

(Hides Latency)

  • Use the caches wisely

    • Avoid large vertex formats that waste pre-VS cache space

    • DrawIndexed() allows for reuse of processed vertices saved in the post-VS cache

      • Vertices with the same index only need to get processed once

Vertex Shader

Post-VS Cache

(Vertex Reuse)


Bottlenecks gs

Bottlenecks - GS

  • GS

    • Can add or remove primitives

    • Adding new primitives requires storing new vertices

      • Going off chip to store data can be a bandwidth issue

    • Using the GS means another shader stage

      • This means more competition for shader resources

      • Better if you can do everything in the VS


Advanced vertex shader features

Advanced Vertex Shader Features

  • SV_VertexID, SV_InstanceID

  • UAV output (DX11.1)

  • NULL vertex buffer

    • VS can create its own vertex data


Sv vertexid

SV_VertexID

  • Can use the vertex id to decide what vertex data to fetch

  • Fetch from SRV, or procedurally create a vertex

    VSOutVertexShader(SV_VertexID id)

    {

    float3 vertex = g_VertexBuffer[id];

    }


Uav buffers

UAV buffers

  • Write to UAVs from a Vertex Shader

    • New feature in DX11.1 (UAV at any stage)

  • Can be used instead of stream-outfor writing vertex data

    • Triangle output not limited to strips

      • You can use whatever format you want

  • Can output anything useful to a UAV


Null vertex buffer

NULL Vertex Buffer

  • DX11/DX10 allows this

    • Just set the number of vertices in Draw()

    • VS will execute without a vertex buffer bound

  • Can be used for instancing

    • Call Draw() with the total number of vertices

    • Bind mesh and instance data as SRVs


Vertex shader techniques

Vertex Shader Techniques

  • Full Screen Triangle

  • Vertex Shader Instancing

    • Merged Instancing

  • Vertex Shader UAVs


Full screen triangle

Full Screen Triangle

  • For post-processing effects

    • Triangle has better performance than quad

  • Fast and easy with VS generated coordinates

    • No IB or VB is necessary

  • Something you should be using for full screen effects

(-1, 3, 0)

(3, -1, 0)

(-1, -1, 0)

Clip Space Coordinates


Full screen triangle c code

Full Screen Triangle: C++ code

// Null VB, IB

pd3dImmediateContext->IASetVertexBuffers( 0, 0, NULL, NULL, NULL );

pd3dImmediateContext->IASetIndexBuffer( NULL, (DXGI_FORMAT)0, 0 );

pd3dImmediateContext->IASetInputLayout( NULL );

// Set Shaders

pd3dImmediateContext->VSSetShader( g_pFullScreenVS, NULL, 0 );

pd3dImmediateContext->PSSetShader( … );

pd3dImmediateContext->PSSetShaderResources( … );

pd3dImmediateContext->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );

// Render 3 vertices for the triangle

pd3dImmediateContext->Draw(3, 0);


Full screen triangle hlsl code

Full Screen Triangle: HLSL Code

VSOutputVSFullScreenTest(uintid:SV_VERTEXID)

{

VSOutput output;

// generate clip space position

output.pos.x = (float)(id / 2) * 4.0 - 1.0;

output.pos.y = (float)(id % 2) * 4.0 - 1.0;

output.pos.z = 0.0;

output.pos.w = 1.0;

// texture coordinates

output.tex.x = (float)(id / 2) * 2.0;

output.tex.y = 1.0 - (float)(id % 2) * 2.0;

// color

output.color = float4(1, 1, 1, 1);

return output;

}

(-1, 3, 0)

(-1, -1, 0)

(3, -1, 0)

Clip Space Coordinates


Vs instancing point sprites

VS Instancing: Point Sprites

  • Often done on GS, but can be faster on VS

    • Create an SRV point buffer and bind to VS

    • Call Draw or DrawIndexed to render the full triangle list.

    • Read the location from the point buffer and expand to vertex location in quad

    • Can be used for particles or Bokeh DOF sprites

    • Don’t use DrawInstanced for a small mesh


Point sprites c code

Point Sprites: C++ Code

pd3d->IASetIndexBuffer( g_pParticleIndexBuffer, DXGI_FORMAT_R32_UINT, 0 );

pd3d->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );

pd3dImmediateContext->DrawIndexed( g_particleCount * 6, 0, 0);


Point sprites hlsl code

Point Sprites: HLSL Code

VSInstancedParticleDrawOutVSIndexBuffer(uintid:SV_VERTEXID)

{

VSInstancedParticleDrawOutoutput;

uintparticleIndex = id / 4;

uintvertexInQuad = id % 4;

// calculate the position of the vertex

float3 position;

position.x = (vertexInQuad % 2) ? 1.0 : -1.0;

position.y = (vertexInQuad & 2) ? -1.0 : 1.0;

position.z = 0.0;

position.xy *= PARTICLE_RADIUS;

position = mul( position, (float3x3)g_mInvView ) + g_bufPosColor[particleIndex].pos.xyz;

output.pos= mul( float4(position,1.0), g_mWorldViewProj );

output.color= g_bufPosColor[particleIndex].color;

// texture coordinate

output.tex.x = (vertexInQuad % 2) ? 1.0 : 0.0;

output.tex.y = (vertexInQuad & 2) ? 1.0 : 0.0;

return output;

}


Point sprite performance

Point Sprite Performance

AMD Radeon R9 290x

Nvidia Titan


Point sprite performance1

Point Sprite Performance

  • DrawIndexed() is the fastest method

  • Draw() is slower but doesn’t need an IB

  • Don’t use DrawInstanced() for creating sprites on either AMD or NVidia hardware

    • Not recommended for a small number of vertices


Merge instancing

Merge Instancing

  • Combine multiple meshes that can be instanced many times

    • Better than normal instancing which renders only one mesh

      • Instance nearby meshes for smaller bounding box

  • Each mesh is a page in the vertex data

    • Fixed vertex count for each mesh

      • Meshes smaller than page size use degenerate triangles


Merge instancing1

Merge Instancing

Mesh Data 0

Instance 0

Mesh Index 2

Mesh Data 1

Vertex 0

Vertex 1

Vertex 2

Vertex 3

.

.

.

0

0

0

Mesh Data 2

Instance 1

Mesh Index 0

.

.

.

.

.

.

Degenerate Triangle

Fixed Length Page

Mesh Vertex Data

Mesh Instance Data


Merged instancing using vs

Merged Instancing using VS

  • Use the vertex ID to look up the mesh to instance

    • All meshes are the same size, so (id / SIZE) can be used as an offset to the mesh

    • Faster than using DrawInstanced()


Merge instancing performance

Merge Instancing Performance

  • Instancing performance test by Cloud Imperium Games for Star Citizen

  • Renders 13.5M triangles (~40M verts)

  • DrawInstanced version calls DrawInstanced() and uses instance data in a vertex buffer

  • Soft Instancing version uses vertex instancing with Draw() calls and fetches instance data from SRV

AMD Radeon R9 290X

ms

Nvidia GTX 780


Vertex shader uavs

Vertex Shader UAVs

  • Random access Read/Write in a VS

  • Can be used to store transformed vertex data for use in multi-pass algorithms

  • Can be used for passing constant attributes between any shader stage (not just from VS)


Skinning to uav

Skinning to UAV

  • Skin vertex data then output to UAV

    • Instance the skinned UAV data multiple times

  • Can also be used for non-instanced data

    • Multiple passes can reuse the transformed vertex data – Shadow map rendering

  • Performance is about the same as stream-out, but you can do more …


Bounding box to uav

Bounding Box to UAV

  • Can calculate and store Bbox in the VS

    • Use a UAV to store the min/max values (6)

    • InterlockedMin/InterlockedMax determine min and max of the bbox

      • Need to use integer values with atomics

  • Use the stored bbox in later passes

    • GPU physics (collision)

    • Tile based processing


Bounding box hlsl code

Bounding Box: HLSL Code

void UAVBBoxSkinVS(VSSkinnedIn input, uintid:SV_VERTEXID )

{

// skin the vertex

. . .

// output the max and min for the bounding box

int x = (int) (vSkinned.Pos.x * FLOAT_SCALE); // convert to integer

int y = (int) (vSkinned.Pos.y * FLOAT_SCALE);

int z = (int) (vSkinned.Pos.z * FLOAT_SCALE);

InterlockedMin(g_BBoxUAV[0], x);

InterlockedMin(g_BBoxUAV[1], y);

InterlockedMin(g_BBoxUAV[2], z);

InterlockedMax(g_BBoxUAV[3], x);

InterlockedMax(g_BBoxUAV[4], y);

InterlockedMax(g_BBoxUAV[5], z);

. . .


Particle system uav

Particle System UAV

  • Single pass GPU-only particle system

  • In the VS:

    • Generate sprites for rendering

    • Do Euler integration and update the particle system state to a UAV


Particle system hlsl code

Particle System: HLSL Code

uintparticleIndex = id / 4;

uintvertexInQuad = id % 4;

// calculate the new position of the vertex

float3 oldPosition = g_bufPosColor[particleIndex].pos.xyz;

float3 oldVelocity = g_bufPosColor[particleIndex].velocity.xyz;

// Euler integration to find new position and velocity

float3 acceleration = normalize(oldVelocity) * ACCELLERATION;

float3 newVelocity = acceleration * g_deltaT+ oldVelocity;

float3 newPosition = newVelocity * g_deltaT+ oldPosition;

g_particleUAV[particleIndex].pos = float4(newPosition, 1.0);

g_particleUAV[particleIndex].velocity = float4(newVelocity, 0.0);

// Generate sprite vertices

. . .


Conclusion

Conclusion

  • Vertex shader “tricks” can be more efficient than more commonly used methods

    • Use SV_Vertex ID for smarter instancing

      • Sprites

      • Merge Instancing

    • UAVs add lots of freedom to vertex shaders

      • Bounding box calculation

      • Single pass VS particle system


Demos

Demos

  • Particle System

  • UAV Skinning

    • Bbox


Acknowledgements

Acknowledgements

  • Merge Instancing

    • Emil Person, “Graphics Gems for Games” SIGGRAPH 2011

    • Brendan Jackson, Cloud Imperium

  • Thanks to

    • Nick Thibieroz, AMD

    • Raul Aguaviva (particle system UAV), AMD

    • Alex Kharlamov, AMD


Questions

Questions

  • [email protected]


  • Login