slide1
Download
Skip this Video
Download Presentation
Topics Covered

Loading in 2 Seconds...

play fullscreen
1 / 33

Topics Covered - PowerPoint PPT Presentation


  • 186 Views
  • Uploaded on

Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD. Topics Covered. Overview of the DX11 front-end pipeline Common bottlenecks Advanced Vertex Shader Features Vertex Shader Techniques Samples and Results.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Topics Covered' - genera


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill BilodeauDeveloper Technology Engineer, AMD
topics covered
Topics Covered
  • Overview of the DX11 front-end pipeline
  • Common bottlenecks
  • Advanced Vertex Shader Features
  • Vertex Shader Techniques
  • Samples and Results
dx11 front end pipeline
DX11 Front-End Pipeline

Input Assembler

CB,

SRV,

or UAV

Vertex Shader

  • VS –vertex data
  • HS – control points
  • Tessellator
  • DS – generated vertices
  • GS – primitives
  • Write to UAV at all stages
    • Starting with DX11.1

Hull Shader

Tessellator

Domain Shader

Geometry Shader

.

.

.

Stream Out

Graphics Hardware

bottlenecks vs
Bottlenecks - VS
  • VS Attributes
    • Limit outputs to 4 attributes (AMD)
      • This applies to all shader stages (except PS)
  • VS Texture Fetches
    • Too many texture fetches can add latency
      • Especially dependent texture fetches
      • Group fetches together for better performance
      • Hide latency with ALU instructions
bottlenecks vs1
Bottlenecks - VS

Input Assembler

Pre-VS Cache

(Hides Latency)

  • Use the caches wisely
    • Avoid large vertex formats that waste pre-VS cache space
    • DrawIndexed() allows for reuse of processed vertices saved in the post-VS cache
      • Vertices with the same index only need to get processed once

Vertex Shader

Post-VS Cache

(Vertex Reuse)

bottlenecks gs
Bottlenecks - GS
  • GS
    • Can add or remove primitives
    • Adding new primitives requires storing new vertices
      • Going off chip to store data can be a bandwidth issue
    • Using the GS means another shader stage
      • This means more competition for shader resources
      • Better if you can do everything in the VS
advanced vertex shader features
Advanced Vertex Shader Features
  • SV_VertexID, SV_InstanceID
  • UAV output (DX11.1)
  • NULL vertex buffer
    • VS can create its own vertex data
sv vertexid
SV_VertexID
  • Can use the vertex id to decide what vertex data to fetch
  • Fetch from SRV, or procedurally create a vertex

VSOutVertexShader(SV_VertexID id)

{

float3 vertex = g_VertexBuffer[id];

}

uav buffers
UAV buffers
  • Write to UAVs from a Vertex Shader
    • New feature in DX11.1 (UAV at any stage)
  • Can be used instead of stream-outfor writing vertex data
    • Triangle output not limited to strips
      • You can use whatever format you want
  • Can output anything useful to a UAV
null vertex buffer
NULL Vertex Buffer
  • DX11/DX10 allows this
    • Just set the number of vertices in Draw()
    • VS will execute without a vertex buffer bound
  • Can be used for instancing
    • Call Draw() with the total number of vertices
    • Bind mesh and instance data as SRVs
vertex shader techniques
Vertex Shader Techniques
  • Full Screen Triangle
  • Vertex Shader Instancing
    • Merged Instancing
  • Vertex Shader UAVs
full screen triangle
Full Screen Triangle
  • For post-processing effects
    • Triangle has better performance than quad
  • Fast and easy with VS generated coordinates
    • No IB or VB is necessary
  • Something you should be using for full screen effects

(-1, 3, 0)

(3, -1, 0)

(-1, -1, 0)

Clip Space Coordinates

full screen triangle c code
Full Screen Triangle: C++ code

// Null VB, IB

pd3dImmediateContext->IASetVertexBuffers( 0, 0, NULL, NULL, NULL );

pd3dImmediateContext->IASetIndexBuffer( NULL, (DXGI_FORMAT)0, 0 );

pd3dImmediateContext->IASetInputLayout( NULL );

// Set Shaders

pd3dImmediateContext->VSSetShader( g_pFullScreenVS, NULL, 0 );

pd3dImmediateContext->PSSetShader( … );

pd3dImmediateContext->PSSetShaderResources( … );

pd3dImmediateContext->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );

// Render 3 vertices for the triangle

pd3dImmediateContext->Draw(3, 0);

full screen triangle hlsl code
Full Screen Triangle: HLSL Code

VSOutputVSFullScreenTest(uintid:SV_VERTEXID)

{

VSOutput output;

// generate clip space position

output.pos.x = (float)(id / 2) * 4.0 - 1.0;

output.pos.y = (float)(id % 2) * 4.0 - 1.0;

output.pos.z = 0.0;

output.pos.w = 1.0;

// texture coordinates

output.tex.x = (float)(id / 2) * 2.0;

output.tex.y = 1.0 - (float)(id % 2) * 2.0;

// color

output.color = float4(1, 1, 1, 1);

return output;

}

(-1, 3, 0)

(-1, -1, 0)

(3, -1, 0)

Clip Space Coordinates

vs instancing point sprites
VS Instancing: Point Sprites
  • Often done on GS, but can be faster on VS
    • Create an SRV point buffer and bind to VS
    • Call Draw or DrawIndexed to render the full triangle list.
    • Read the location from the point buffer and expand to vertex location in quad
    • Can be used for particles or Bokeh DOF sprites
    • Don’t use DrawInstanced for a small mesh
point sprites c code
Point Sprites: C++ Code

pd3d->IASetIndexBuffer( g_pParticleIndexBuffer, DXGI_FORMAT_R32_UINT, 0 );

pd3d->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );

pd3dImmediateContext->DrawIndexed( g_particleCount * 6, 0, 0);

point sprites hlsl code
Point Sprites: HLSL Code

VSInstancedParticleDrawOutVSIndexBuffer(uintid:SV_VERTEXID)

{

VSInstancedParticleDrawOutoutput;

uintparticleIndex = id / 4;

uintvertexInQuad = id % 4;

// calculate the position of the vertex

float3 position;

position.x = (vertexInQuad % 2) ? 1.0 : -1.0;

position.y = (vertexInQuad & 2) ? -1.0 : 1.0;

position.z = 0.0;

position.xy *= PARTICLE_RADIUS;

position = mul( position, (float3x3)g_mInvView ) + g_bufPosColor[particleIndex].pos.xyz;

output.pos= mul( float4(position,1.0), g_mWorldViewProj );

output.color= g_bufPosColor[particleIndex].color;

// texture coordinate

output.tex.x = (vertexInQuad % 2) ? 1.0 : 0.0;

output.tex.y = (vertexInQuad & 2) ? 1.0 : 0.0;

return output;

}

point sprite performance
Point Sprite Performance

AMD Radeon R9 290x

Nvidia Titan

point sprite performance1
Point Sprite Performance
  • DrawIndexed() is the fastest method
  • Draw() is slower but doesn’t need an IB
  • Don’t use DrawInstanced() for creating sprites on either AMD or NVidia hardware
    • Not recommended for a small number of vertices
merge instancing
Merge Instancing
  • Combine multiple meshes that can be instanced many times
    • Better than normal instancing which renders only one mesh
      • Instance nearby meshes for smaller bounding box
  • Each mesh is a page in the vertex data
    • Fixed vertex count for each mesh
      • Meshes smaller than page size use degenerate triangles
merge instancing1
Merge Instancing

Mesh Data 0

Instance 0

Mesh Index 2

Mesh Data 1

Vertex 0

Vertex 1

Vertex 2

Vertex 3

.

.

.

0

0

0

Mesh Data 2

Instance 1

Mesh Index 0

.

.

.

.

.

.

Degenerate Triangle

Fixed Length Page

Mesh Vertex Data

Mesh Instance Data

merged instancing using vs
Merged Instancing using VS
  • Use the vertex ID to look up the mesh to instance
    • All meshes are the same size, so (id / SIZE) can be used as an offset to the mesh
    • Faster than using DrawInstanced()
merge instancing performance
Merge Instancing Performance
  • Instancing performance test by Cloud Imperium Games for Star Citizen
  • Renders 13.5M triangles (~40M verts)
  • DrawInstanced version calls DrawInstanced() and uses instance data in a vertex buffer
  • Soft Instancing version uses vertex instancing with Draw() calls and fetches instance data from SRV

AMD Radeon R9 290X

ms

Nvidia GTX 780

vertex shader uavs
Vertex Shader UAVs
  • Random access Read/Write in a VS
  • Can be used to store transformed vertex data for use in multi-pass algorithms
  • Can be used for passing constant attributes between any shader stage (not just from VS)
skinning to uav
Skinning to UAV
  • Skin vertex data then output to UAV
    • Instance the skinned UAV data multiple times
  • Can also be used for non-instanced data
    • Multiple passes can reuse the transformed vertex data – Shadow map rendering
  • Performance is about the same as stream-out, but you can do more …
bounding box to uav
Bounding Box to UAV
  • Can calculate and store Bbox in the VS
    • Use a UAV to store the min/max values (6)
    • InterlockedMin/InterlockedMax determine min and max of the bbox
      • Need to use integer values with atomics
  • Use the stored bbox in later passes
    • GPU physics (collision)
    • Tile based processing
bounding box hlsl code
Bounding Box: HLSL Code

void UAVBBoxSkinVS(VSSkinnedIn input, uintid:SV_VERTEXID )

{

// skin the vertex

. . .

// output the max and min for the bounding box

int x = (int) (vSkinned.Pos.x * FLOAT_SCALE); // convert to integer

int y = (int) (vSkinned.Pos.y * FLOAT_SCALE);

int z = (int) (vSkinned.Pos.z * FLOAT_SCALE);

InterlockedMin(g_BBoxUAV[0], x);

InterlockedMin(g_BBoxUAV[1], y);

InterlockedMin(g_BBoxUAV[2], z);

InterlockedMax(g_BBoxUAV[3], x);

InterlockedMax(g_BBoxUAV[4], y);

InterlockedMax(g_BBoxUAV[5], z);

. . .

particle system uav
Particle System UAV
  • Single pass GPU-only particle system
  • In the VS:
    • Generate sprites for rendering
    • Do Euler integration and update the particle system state to a UAV
particle system hlsl code
Particle System: HLSL Code

uintparticleIndex = id / 4;

uintvertexInQuad = id % 4;

// calculate the new position of the vertex

float3 oldPosition = g_bufPosColor[particleIndex].pos.xyz;

float3 oldVelocity = g_bufPosColor[particleIndex].velocity.xyz;

// Euler integration to find new position and velocity

float3 acceleration = normalize(oldVelocity) * ACCELLERATION;

float3 newVelocity = acceleration * g_deltaT+ oldVelocity;

float3 newPosition = newVelocity * g_deltaT+ oldPosition;

g_particleUAV[particleIndex].pos = float4(newPosition, 1.0);

g_particleUAV[particleIndex].velocity = float4(newVelocity, 0.0);

// Generate sprite vertices

. . .

conclusion
Conclusion
  • Vertex shader “tricks” can be more efficient than more commonly used methods
    • Use SV_Vertex ID for smarter instancing
      • Sprites
      • Merge Instancing
    • UAVs add lots of freedom to vertex shaders
      • Bounding box calculation
      • Single pass VS particle system
demos
Demos
  • Particle System
  • UAV Skinning
    • Bbox
acknowledgements
Acknowledgements
  • Merge Instancing
    • Emil Person, “Graphics Gems for Games” SIGGRAPH 2011
    • Brendan Jackson, Cloud Imperium
  • Thanks to
    • Nick Thibieroz, AMD
    • Raul Aguaviva (particle system UAV), AMD
    • Alex Kharlamov, AMD
ad