OIT and Indirect Illumination using DX11 Linked Lists

OIT and Indirect Illumination using DX11 Linked Lists Holger Gruen AMD ISV Relations Nicolas Thibieroz AMD ISV Relations

Agenda • Introduction • Linked List Rendering • Order Independent Transparency • Indirect Illumination • Q&A

Introduction • Direct3D 11 HW opens the door to many new rendering algorithms • In particular per pixel linked lists allow for a number of new techniquesOIT, Indirect Shadows, Ray Tracing of dynamic scenes, REYES surface dicing, custom AA, Irregular Z-buffering, custom blending, Advanced Depth of Field, etc. • This talk will walk you through:A DX11 implementation of per-pixel linked list and two effects that utilize this techique • OIT • Indirect Illumination

Per-Pixel Linked Lists with Direct3D 11 Nicolas Thibieroz European ISV Relations AMD Element Element Element Element Link Link Link Link

Why Linked Lists? • Data structure useful for programming • Very hard to implement efficiently with previous real-time graphics APIs • DX11 allows efficient creation and parsing of linked lists • Per-pixel linked lists • A collection of linked lists enumerating all pixels belonging to the same screen position Element Element Element Element Link Link Link Link

Two-step process • 1) Linked List Creation • Store incoming fragments into linked lists • 2) Rendering from Linked List • Linked List traversal and processing of stored fragments

Creating Per-Pixel Linked Lists

PS5.0 and UAVs • Uses a Pixel Shader 5.0 to store fragments into linked lists • Not a Compute Shader 5.0! • Uses atomic operations • Two UAV buffers required • - “Fragment & Link” buffer • - “Start Offset” buffer UAV = Unordered Access View

Fragment & Link Buffer • The “Fragment & Link” buffer contains data and link for all fragments to store • Must be large enough to store all fragments • Created with Counter support • D3D11_BUFFER_UAV_FLAG_COUNTER flag in UAV view • Declaration: structFragmentAndLinkBuffer_STRUCT { FragmentData_STRUCTFragmentData; // Fragment data uintuNext; // Link to next fragment }; RWStructuredBuffer <FragmentAndLinkBuffer_STRUCT> FLBuffer;

Start Offset Buffer • The “Start Offset” buffer contains the offset of the last fragment written at every pixel location • Screen-sized:(width * height * sizeof(UINT32) ) • Initialized to magic value (e.g. -1) • Magic value indicates no more fragments are stored (i.e. end of the list) • Declaration: RWByteAddressBufferStartOffsetBuffer;

Linked List Creation (1) • No color Render Target bound! • No rendering yet, just storing in L.L. • Depth buffer bound if needed • OIT will need it in a few slides • UAVs bounds as input/output: • StartOffsetBuffer (R/W) • FragmentAndLinkBuffer (W)

Linked List Creation (2a) Start Offset Buffer -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Viewport -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 Fragment and Link Buffer Counter = Fragment and Link Buffer Fragment and Link Buffer Fragment and Link Buffer

Linked List Creation (2b) Start Offset Buffer -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Viewport -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 0 Fragment and Link Buffer Counter = Fragment and Link Buffer -1 Fragment and Link Buffer -1 Fragment and Link Buffer

Linked List Creation (2c) Start Offset Buffer -1 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Viewport -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 3 2 Fragment and Link Buffer Counter = Fragment and Link Buffer -1 Fragment and Link Buffer -1 -1 Fragment and Link Buffer

Linked List Creation (2d) Start Offset Buffer -1 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Viewport -1 -1 -1 -1 -1 -1 -1 -1 -1 1 2 -1 -1 -1 -1 -1 -1 -1 4 3 5 Fragment and Link Buffer Counter = -1 -1 -1 0 Fragment and Link Buffer -1 Fragment and Link Buffer

Linked List Creation - Code float PS_StoreFragments(PS_INPUT input) : SV_Target { // Calculate fragment data (color, depth, etc.) FragmentData_STRUCTFragmentData = ComputeFragment(); // Retrieve current pixel count and increase counter uintuPixelCount = FLBuffer.IncrementCounter(); // Exchange offsets in StartOffsetBuffer uintvPos = uint(input.vPos); uintuStartOffsetAddress= 4 * ( (SCREEN_WIDTH*vPos.y) + vPos.x ); uintuOldStartOffset; StartOffsetBuffer.InterlockedExchange(uStartOffsetAddress, uPixelCount, uOldStartOffset); // Add new fragment entry in Fragment & Link Buffer FragmentAndLinkBuffer_STRUCT Element; Element.FragmentData = FragmentData; Element.uNext = uOldStartOffset; FLBuffer[uPixelCount] = Element; }

Traversing Per-Pixel Linked Lists

Rendering Pixels (1) • “Start Offset” Buffer and “Fragment & Link” Buffer now bound as SRV Buffer<uint> StartOffsetBufferSRV; StructuredBuffer<FragmentAndLinkBuffer_STRUCT> FLBufferSRV; • Render a fullscreen quad • For each pixel, parse the linked list and retrieve fragments for this screen position • Process list of fragments as required • Depends on algorithm • e.g. sorting, finding maximum, etc. SRV = Shader Resource View

Rendering from Linked List Start Offset Buffer -1 -1 -1 -1 -1 -1 -1 3 3 4 -1 -1 -1 -1 -1 -1 -1 -1 -1 Render Target -1 -1 -1 -1 -1 -1 -1 -1 -1 1 2 -1 -1 -1 -1 -1 -1 -1 Fragment and Link Buffer -1 -1 -1 -1 0 0 Fragment and Link Buffer -1 Fragment and Link Buffer

Rendering Pixels (2) float4 PS_RenderFragments(PS_INPUT input) : SV_Target { // Calculate UINT-aligned start offset buffer address uintvPos = uint(input.vPos); uintuStartOffsetAddress = SCREEN_WIDTH*vPos.y + vPos.x; // Fetch offset of first fragment for current pixel uintuOffset = StartOffsetBufferSRV.Load(uStartOffsetAddress); // Parse linked list for all fragments at this position float4 FinalColor=float4(0,0,0,0); while (uOffset!=0xFFFFFFFF) // 0xFFFFFFFF is magic value { // Retrieve pixel at current offset Element=FLBufferSRV[uOffset]; // Process pixel as required ProcessPixel(Element, FinalColor); // Retrieve next offset uOffset = Element.uNext; } return (FinalColor); }

Order-Independent Transparency via Per-Pixel Linked Lists Nicolas Thibieroz European ISV Relations AMD

Description • Straight application of the linked list algorithm • Stores transparent fragments into PPLL • Rendering phase sorts pixels in a back-to-front order and blends them manually in a pixel shader • Blend mode can be unique per-pixel! • Special case for MSAA support

Linked List Structure • Optimize performance by reducing amount of data to write to/read from UAV • E.g. uint instead of float4 for color • Example data structure for OIT: structFragmentAndLinkBuffer_STRUCT { uintuPixelColor; // Packed pixel color uintuDepth; // Pixel depth uintuNext; // Address of next link }; • May also get away with packed color and depth into the same uint! (if same alpha) • 16 bits color (565) + 16 bits depth • Performance/memory/quality trade-off

Visible Fragments Only! • Use [earlydepthstencil] in front of Linked List creation pixel shader • This ensures only transparent fragments that pass the depth test are stored • i.e. Visible fragments! • Allows performance savings and rendering correctness! [earlydepthstencil] float PS_StoreFragments(PS_INPUT input) : SV_Target { ... }

Sorting Pixels • Sorting in place requires R/W access to Linked List • Sparse memory accesses = slow! • Better way is to copy all pixels into array of temp registers • Then do the sorting • Temp array declaration means a hard limit on number of pixel per screen coordinates • Required trade-off for performance

Sorting and Blending 0.95 0.93 0.87 0.98 • Blend fragments back to front in PS • Blending algorithm up to app • Example: SRCALPHA-INVSRCALPHA • Or unique per pixel! (stored in fragment data) • Background passed as input texture • Actual HW blending mode disabled Background color Temp Array Render Target PS color 0.98 0.95 0.87 0.93 -1 34 12 0

Storing Pixels for Sorting (...) static uint2 SortedPixels[MAX_SORTED_PIXELS]; // Parse linked list for all pixels at this position // and store them into temp array for later sorting intnNumPixels=0; while (uOffset!=0xFFFFFFFF) { // Retrieve pixel at current offset Element=FLBufferSRV[uOffset]; // Copy pixel data into temp array SortedPixels[nNumPixels++]= uint2(Element.uPixelColor, Element.uDepth); // Retrieve next offset [flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ? 0xFFFFFFFF : Element.uNext; } // Sort pixels in-place SortPixelsInPlace(SortedPixels, nNumPixels); (...)

Pixel Blending in PS (...) // Retrieve current color from background texture float4 vCurrentColor=BackgroundTexture.Load(int3(vPos.xy, 0)); // Rendering pixels using SRCALPHA-INVSRCALPHA blending for (int k=0; k<nNumPixels; k++) { // Retrieve next unblended furthermost pixel float4 vPixColor= UnpackFromUint(SortedPixels[k].x); // Manual blending between current fragment and previous one vCurrentColor.xyz= lerp(vCurrentColor.xyz, vPixColor.xyz, vPixColor.w); } // Return manually-blended color return vCurrentColor; }

OIT via Per-Pixel Linked Lists with MSAA Support

Sample Coverage • Storing individual samples into Linked Lists requires a huge amount of memory • ... and performance will suffer! • Solution is to store transparent pixels into PPLL as before • But including sample coverage too! • Requires as many bits as MSAA mode • Declare SV_COVERAGE in PS structure struct PS_INPUT { float3 vNormal : NORMAL; float2 vTex : TEXCOORD; float4 vPos : SV_POSITION; uintuCoverage : SV_COVERAGE; }

Linked List Structure • Almost unchanged from previously • Depth is now packed into 24 bits • 8 Bits are used to store coverage structFragmentAndLinkBuffer_STRUCT { uintuPixelColor; // Packed pixel color uintuDepthAndCoverage; // Depth + coverage uintuNext; // Address of next link };

Sample Coverage Example • Third sample is covered • uCoverage = 0x04 (0100 in binary) Element.uDepthAndCoverage = ( In.vPos.z*(2^24-1) << 8 ) | In.uCoverage; Pixel Center Sample

Rendering Samples (1) • Rendering phase needs to be able to write individual samples • Thus PS is run at sample frequency • Can be done by declaring SV_SAMPLEINDEX in input structure • Parse linked list and store pixels into temp array for later sorting • Similar to non-MSAA case • Difference is to only store sample if coverage matches sample index being rasterized

Rendering Samples (2) static uint2 SortedPixels[MAX_SORTED_PIXELS]; // Parse linked list for all pixels at this position // and store them into temp array for later sorting intnNumPixels=0; while (uOffset!=0xFFFFFFFF) { // Retrieve pixel at current offset Element=FLBufferSRV[uOffset]; // Retrieve pixel coverage from linked list element uintuCoverage=UnpackCoverage(Element.uDepthAndCoverage); if ( uCoverage & (1<<In.uSampleIndex) ) { // Coverage matches current sample so copy pixel SortedPixels[nNumPixels++]=Element; } // Retrieve next offset [flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ? 0xFFFFFFFF : Element.uNext; }

DEMO OIT Linked List Demo

Holger GruenEuropean ISV Relations AMD Direct3D 11 Indirect Illumination

Indirect Illumination Introduction 1 • Real-time Indirect illumination is an active research topic • Numerous approaches existReflective Shadow Maps (RSM) [Dachsbacher/Stammiger05]Splatting Indirect Illumination [Dachsbacher/Stammiger2006]Multi-Res Splatting of Illumination [Wyman2009]Light propagation volumes [Kapalanyan2009]Approximating Dynamic Global Illumination in Image Space [Ritschel2009] Only a few support indirect shadowsImperfect Shadow Maps [Ritschel/Grosch2008]Micro-Rendering for Scalable, Parallel Final Gathering(SSDO) [Ritschel2010] Cascaded light propagation volumes for real-time indirect illumination [Kapalanyan/Dachsbacher2010] • Most approaches somehow extend to multi-bounce lighting

Indirect Illumination Introduction 2 • This section will coverAn efficient and simple DX9-compliant RSM based implementation for smooth one bounce indirect illumination • Indirect shadows are ignored here • A Direct3D 11 technique that traces rays to compute indirect shadows • Part of this technique could generally be used for ray-tracing dynamic scenes

Indirect Illumination w/o Indirect Shadows • Draw scene g-buffer • Draw Reflective Shadowmap (RSM) • RSM shows the part of the scene that recieves direct light from the light source • Draw Indirect Light buffer at ¼ res • RSM texels are used as light sources on g-buffer pixels for indirect lighting • Upsample Indirect Light (IL) • Draw final image adding IL

Step 1 • G-Buffer needs to allow reconstruction of • World/Camera space position • World/Camera space normal • Color/ Albedo • DXGI_FORMAT_R32G32B32A32_FLOAT positions may be required for precise ray queries for indirect shadows

Step 2 • RSM needs to allow reconstruction of • World space position • World space normal • Color/ Albedo • Only draw emitters of indirect light • DXGI_FORMAT_R32G32B32A32_FLOAT position may be required for ray precise queries for indirect shadows

Step 3 • Render a ¼ res IL as a deferred op • Transform g-buffer pix to RSM space • ->Light Space->project to RSM texel space • Use a kernel of RSM texels as light sources • RSM texels also called Virtual Point Light(VPL) • Kernel size depends on • Desired speed • Desired look of the effect • RSM resolution

Computing IL at a G-buf Pixel 1 Sum up contribution of all VPLs in the kernel

Computing IL at a G-buf Pixel 2 RSM texel/VPL g-buffer pixel This term is very similar to terms used in radiosity form factor computations

Computing IL at a G-buf Pixel 3 A naive solution for smooth IL needs to consider four VPL kernels with centers at t0, t1, t2 and t3. stx : sub RSM texel x position [0.0, 1.0[ sty : sub RSM texel y position [0.0, 1.0[

Computing IL at a g-buf pixel 4 IndirectLight = (1.0f-sty) * ((1.0f-stx) * + stx * ) + (0.0f+sty) * ((1.0f-stx) * + stx * ) Evaluation of 4 big VPL kernels is slow  VPL kernel at t0 stx : sub texel x position [0.0, 1.0[ VPL kernel at t2 sty : sub texel y position [0.0, 1.0[ VPL kernel at t1 VPL kernel at t3

Computing IL at a g-buf pixel 5 SmoothIndirectLight = (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+ (0.0f+sty)*(((1.0f-stx)*(B6+B3)+stx*(B8+B5))+B7)+B4 stx : sub RSM texel x position of g-buf pix [0.0, 1.0[ sty : sub RSM texel y position of g-buf pix [0.0, 1.0[ This trick is probably known to some of you already. See backup for a detailed explanation !

Indirect Light Buffer

Step 4 • Indirect Light buffer is ¼ res • Perform a bilateral upsampling step • SeePeter-Pike Sloan, Naga K. Govindaraju, Derek Nowrouzezahrai, John Snyder. "Image-Based Proxy Accumulation for Real-Time Soft Global Illumination". Pacific Graphics 2007 • Result is a full resolution IL

OIT and Indirect Illumination using DX11 Linked Lists