Download
1 / 63

OIT and Indirect Illumination using DX11 Linked Lists - PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on

OIT and Indirect Illumination using DX11 Linked Lists. Holger Gruen AMD ISV Relations Nicolas Thibieroz AMD ISV Relations. Agenda. Introduction Linked List Rendering Order Independent Transparency Indirect Illumination Q&A. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' OIT and Indirect Illumination using DX11 Linked Lists' - germaine-vega


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Oit and indirect illumination using dx11 linked lists
OIT and Indirect Illumination using DX11 Linked Lists

Holger Gruen AMD ISV Relations

Nicolas Thibieroz AMD ISV Relations


Agenda
Agenda

  • Introduction

  • Linked List Rendering

  • Order Independent Transparency

  • Indirect Illumination

  • Q&A


Introduction
Introduction

  • Direct3D 11 HW opens the door to many new rendering algorithms

  • In particular per pixel linked lists allow for a number of new techniquesOIT, Indirect Shadows, Ray Tracing of dynamic scenes, REYES surface dicing, custom AA, Irregular Z-buffering, custom blending, Advanced Depth of Field, etc.

  • This talk will walk you through:A DX11 implementation of per-pixel linked list and two effects that utilize this techique

    • OIT

    • Indirect Illumination


Per pixel linked lists with direct3d 11
Per-Pixel Linked Lists with Direct3D 11

Nicolas Thibieroz

European ISV Relations

AMD

Element

Element

Element

Element

Link

Link

Link

Link


Why linked lists
Why Linked Lists?

  • Data structure useful for programming

  • Very hard to implement efficiently with previous real-time graphics APIs

  • DX11 allows efficient creation and parsing of linked lists

  • Per-pixel linked lists

    • A collection of linked lists enumerating all pixels belonging to the same screen position

Element

Element

Element

Element

Link

Link

Link

Link


Two step process
Two-step process

  • 1) Linked List Creation

    • Store incoming fragments into linked lists

  • 2) Rendering from Linked List

    • Linked List traversal and processing of stored fragments


Creating per pixel linked lists
Creating Per-Pixel Linked Lists


Ps5 0 and uavs
PS5.0 and UAVs

  • Uses a Pixel Shader 5.0 to store fragments into linked lists

    • Not a Compute Shader 5.0!

  • Uses atomic operations

  • Two UAV buffers required

    • - “Fragment & Link” buffer

    • - “Start Offset” buffer

      UAV = Unordered Access View


Fragment link buffer
Fragment & Link Buffer

  • The “Fragment & Link” buffer contains data and link for all fragments to store

  • Must be large enough to store all fragments

  • Created with Counter support

    • D3D11_BUFFER_UAV_FLAG_COUNTER flag in UAV view

  • Declaration:

    structFragmentAndLinkBuffer_STRUCT

    {

    FragmentData_STRUCTFragmentData; // Fragment data

    uintuNext; // Link to next fragment

    };

    RWStructuredBuffer <FragmentAndLinkBuffer_STRUCT> FLBuffer;


Start offset buffer
Start Offset Buffer

  • The “Start Offset” buffer contains the offset of the last fragment written at every pixel location

  • Screen-sized:(width * height * sizeof(UINT32) )

  • Initialized to magic value (e.g. -1)

    • Magic value indicates no more fragments are stored (i.e. end of the list)

  • Declaration:

    RWByteAddressBufferStartOffsetBuffer;


Linked list creation 1
Linked List Creation (1)

  • No color Render Target bound!

    • No rendering yet, just storing in L.L.

  • Depth buffer bound if needed

    • OIT will need it in a few slides

  • UAVs bounds as input/output:

    • StartOffsetBuffer (R/W)

    • FragmentAndLinkBuffer (W)


Linked list creation 2a
Linked List Creation (2a)

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

Viewport

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

0

Fragment and Link Buffer

Counter =

Fragment and Link Buffer

Fragment and Link Buffer

Fragment and Link Buffer


Linked list creation 2b
Linked List Creation (2b)

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

Viewport

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

0

Fragment and Link Buffer

Counter =

Fragment and Link Buffer

-1

Fragment and Link Buffer

-1

Fragment and Link Buffer


Linked list creation 2c
Linked List Creation (2c)

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

0

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

Viewport

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

3

2

Fragment and Link Buffer

Counter =

Fragment and Link Buffer

-1

Fragment and Link Buffer

-1

-1

Fragment and Link Buffer


Linked list creation 2d
Linked List Creation (2d)

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

0

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

Viewport

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

2

-1

-1

-1

-1

-1

-1

-1

4

3

5

Fragment and Link Buffer

Counter =

-1

-1

-1

0

Fragment and Link Buffer

-1

Fragment and Link Buffer


Linked list creation code
Linked List Creation - Code

float PS_StoreFragments(PS_INPUT input) : SV_Target

{

// Calculate fragment data (color, depth, etc.)

FragmentData_STRUCTFragmentData = ComputeFragment();

// Retrieve current pixel count and increase counter

uintuPixelCount = FLBuffer.IncrementCounter();

// Exchange offsets in StartOffsetBuffer

uintvPos = uint(input.vPos);

uintuStartOffsetAddress= 4 * ( (SCREEN_WIDTH*vPos.y) + vPos.x );

uintuOldStartOffset;

StartOffsetBuffer.InterlockedExchange(uStartOffsetAddress, uPixelCount, uOldStartOffset);

// Add new fragment entry in Fragment & Link Buffer

FragmentAndLinkBuffer_STRUCT Element;

Element.FragmentData = FragmentData;

Element.uNext = uOldStartOffset;

FLBuffer[uPixelCount] = Element;

}



Rendering pixels 1
Rendering Pixels (1)

  • “Start Offset” Buffer and “Fragment & Link” Buffer now bound as SRV

    Buffer<uint> StartOffsetBufferSRV;

    StructuredBuffer<FragmentAndLinkBuffer_STRUCT> FLBufferSRV;

  • Render a fullscreen quad

  • For each pixel, parse the linked list and retrieve fragments for this screen position

  • Process list of fragments as required

    • Depends on algorithm

    • e.g. sorting, finding maximum, etc.

      SRV = Shader Resource View


Rendering from linked list
Rendering from Linked List

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

3

3

4

-1

-1

-1

-1

-1

-1

-1

-1

-1

Render Target

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

2

-1

-1

-1

-1

-1

-1

-1

Fragment and Link Buffer

-1

-1

-1

-1

0

0

Fragment and Link Buffer

-1

Fragment and Link Buffer


Rendering pixels 2
Rendering Pixels (2)

float4 PS_RenderFragments(PS_INPUT input) : SV_Target

{

// Calculate UINT-aligned start offset buffer address

uintvPos = uint(input.vPos);

uintuStartOffsetAddress = SCREEN_WIDTH*vPos.y + vPos.x;

// Fetch offset of first fragment for current pixel

uintuOffset = StartOffsetBufferSRV.Load(uStartOffsetAddress);

// Parse linked list for all fragments at this position

float4 FinalColor=float4(0,0,0,0);

while (uOffset!=0xFFFFFFFF) // 0xFFFFFFFF is magic value

{

// Retrieve pixel at current offset

Element=FLBufferSRV[uOffset];

// Process pixel as required

ProcessPixel(Element, FinalColor);

// Retrieve next offset

uOffset = Element.uNext;

}

return (FinalColor);

}


Order independent transparency via per pixel linked lists
Order-Independent Transparency via Per-Pixel Linked Lists

Nicolas Thibieroz

European ISV Relations

AMD


Description
Description

  • Straight application of the linked list algorithm

  • Stores transparent fragments into PPLL

  • Rendering phase sorts pixels in a back-to-front order and blends them manually in a pixel shader

    • Blend mode can be unique per-pixel!

  • Special case for MSAA support


Linked list structure
Linked List Structure

  • Optimize performance by reducing amount of data to write to/read from UAV

  • E.g. uint instead of float4 for color

  • Example data structure for OIT:

    structFragmentAndLinkBuffer_STRUCT

    {

    uintuPixelColor; // Packed pixel color

    uintuDepth; // Pixel depth

    uintuNext; // Address of next link

    };

  • May also get away with packed color and depth into the same uint! (if same alpha)

    • 16 bits color (565) + 16 bits depth

    • Performance/memory/quality trade-off


Visible fragments only
Visible Fragments Only!

  • Use [earlydepthstencil] in front of Linked List creation pixel shader

  • This ensures only transparent fragments that pass the depth test are stored

    • i.e. Visible fragments!

  • Allows performance savings and rendering correctness!

    [earlydepthstencil]

    float PS_StoreFragments(PS_INPUT input) : SV_Target

    {

    ...

    }


Sorting pixels
Sorting Pixels

  • Sorting in place requires R/W access to Linked List

  • Sparse memory accesses = slow!

  • Better way is to copy all pixels into array of temp registers

    • Then do the sorting

  • Temp array declaration means a hard limit on number of pixel per screen coordinates

    • Required trade-off for performance


Sorting and blending
Sorting and Blending

0.95

0.93

0.87

0.98

  • Blend fragments back to front in PS

    • Blending algorithm up to app

    • Example: SRCALPHA-INVSRCALPHA

    • Or unique per pixel! (stored in fragment data)

  • Background passed as input texture

    • Actual HW blending mode disabled

Background

color

Temp Array

Render Target

PS color

0.98

0.95

0.87

0.93

-1

34

12

0


Storing pixels for sorting
Storing Pixels for Sorting

(...)

static uint2 SortedPixels[MAX_SORTED_PIXELS];

// Parse linked list for all pixels at this position

// and store them into temp array for later sorting

intnNumPixels=0;

while (uOffset!=0xFFFFFFFF)

{

// Retrieve pixel at current offset

Element=FLBufferSRV[uOffset];

// Copy pixel data into temp array

SortedPixels[nNumPixels++]=

uint2(Element.uPixelColor, Element.uDepth);

// Retrieve next offset

[flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ?

0xFFFFFFFF : Element.uNext;

}

// Sort pixels in-place

SortPixelsInPlace(SortedPixels, nNumPixels);

(...)


Pixel blending in ps
Pixel Blending in PS

(...)

// Retrieve current color from background texture

float4 vCurrentColor=BackgroundTexture.Load(int3(vPos.xy, 0));

// Rendering pixels using SRCALPHA-INVSRCALPHA blending

for (int k=0; k<nNumPixels; k++)

{

// Retrieve next unblended furthermost pixel

float4 vPixColor= UnpackFromUint(SortedPixels[k].x);

// Manual blending between current fragment and previous one

vCurrentColor.xyz= lerp(vCurrentColor.xyz, vPixColor.xyz,

vPixColor.w);

}

// Return manually-blended color

return vCurrentColor;

}



Sample coverage
Sample Coverage

  • Storing individual samples into Linked Lists requires a huge amount of memory

    • ... and performance will suffer!

  • Solution is to store transparent pixels into PPLL as before

  • But including sample coverage too!

    • Requires as many bits as MSAA mode

  • Declare SV_COVERAGE in PS structure

    struct PS_INPUT

    {

    float3 vNormal : NORMAL;

    float2 vTex : TEXCOORD;

    float4 vPos : SV_POSITION;

    uintuCoverage : SV_COVERAGE;

    }


Linked list structure1
Linked List Structure

  • Almost unchanged from previously

  • Depth is now packed into 24 bits

  • 8 Bits are used to store coverage

    structFragmentAndLinkBuffer_STRUCT

    {

    uintuPixelColor; // Packed pixel color

    uintuDepthAndCoverage; // Depth + coverage

    uintuNext; // Address of next link

    };


Sample coverage example
Sample Coverage Example

  • Third sample is covered

    • uCoverage = 0x04 (0100 in binary)

      Element.uDepthAndCoverage =

      ( In.vPos.z*(2^24-1) << 8 ) | In.uCoverage;

Pixel Center

Sample


Rendering samples 1
Rendering Samples (1)

  • Rendering phase needs to be able to write individual samples

  • Thus PS is run at sample frequency

    • Can be done by declaring SV_SAMPLEINDEX in input structure

  • Parse linked list and store pixels into temp array for later sorting

    • Similar to non-MSAA case

    • Difference is to only store sample if coverage matches sample index being rasterized


Rendering Samples (2)

static uint2 SortedPixels[MAX_SORTED_PIXELS];

// Parse linked list for all pixels at this position

// and store them into temp array for later sorting

intnNumPixels=0;

while (uOffset!=0xFFFFFFFF)

{

// Retrieve pixel at current offset

Element=FLBufferSRV[uOffset];

// Retrieve pixel coverage from linked list element

uintuCoverage=UnpackCoverage(Element.uDepthAndCoverage);

if ( uCoverage & (1<<In.uSampleIndex) )

{

// Coverage matches current sample so copy pixel

SortedPixels[nNumPixels++]=Element;

}

// Retrieve next offset

[flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ?

0xFFFFFFFF : Element.uNext;

}


Oit linked list demo

DEMO

OIT Linked List Demo


Holger gruen european isv relations amd
Holger GruenEuropean ISV Relations AMD

Direct3D 11 Indirect Illumination


Indirect illumination introduction 1
Indirect Illumination Introduction 1

  • Real-time Indirect illumination is an active research topic

  • Numerous approaches existReflective Shadow Maps (RSM) [Dachsbacher/Stammiger05]Splatting Indirect Illumination [Dachsbacher/Stammiger2006]Multi-Res Splatting of Illumination [Wyman2009]Light propagation volumes [Kapalanyan2009]Approximating Dynamic Global Illumination in Image Space [Ritschel2009] Only a few support indirect shadowsImperfect Shadow Maps [Ritschel/Grosch2008]Micro-Rendering for Scalable, Parallel Final Gathering(SSDO) [Ritschel2010] Cascaded light propagation volumes for real-time indirect illumination [Kapalanyan/Dachsbacher2010]

  • Most approaches somehow extend to multi-bounce lighting


Indirect illumination introduction 2
Indirect Illumination Introduction 2

  • This section will coverAn efficient and simple DX9-compliant RSM based implementation for smooth one bounce indirect illumination

    • Indirect shadows are ignored here

  • A Direct3D 11 technique that traces rays to compute indirect shadows

    • Part of this technique could generally be used for ray-tracing dynamic scenes


  • Indirect illumination w o indirect shadows
    Indirect Illumination w/o Indirect Shadows

    • Draw scene g-buffer

    • Draw Reflective Shadowmap (RSM)

      • RSM shows the part of the scene that recieves direct light from the light source

    • Draw Indirect Light buffer at ¼ res

      • RSM texels are used as light sources on g-buffer pixels for indirect lighting

    • Upsample Indirect Light (IL)

    • Draw final image adding IL


    Step 1
    Step 1

    • G-Buffer needs to allow reconstruction of

      • World/Camera space position

      • World/Camera space normal

      • Color/ Albedo

    • DXGI_FORMAT_R32G32B32A32_FLOAT positions may be required for precise ray queries for indirect shadows


    Step 2
    Step 2

    • RSM needs to allow reconstruction of

      • World space position

      • World space normal

      • Color/ Albedo

    • Only draw emitters of indirect light

    • DXGI_FORMAT_R32G32B32A32_FLOAT position may be required for ray precise queries for indirect shadows


    Step 3
    Step 3

    • Render a ¼ res IL as a deferred op

    • Transform g-buffer pix to RSM space

      • ->Light Space->project to RSM texel space

    • Use a kernel of RSM texels as light sources

      • RSM texels also called Virtual Point Light(VPL)

    • Kernel size depends on

      • Desired speed

      • Desired look of the effect

      • RSM resolution


    Computing il at a g buf pixel 1
    Computing IL at a G-buf Pixel 1

    Sum up contribution of all VPLs in the kernel


    Computing il at a g buf pixel 2
    Computing IL at a G-buf Pixel 2

    RSM texel/VPL

    g-buffer pixel

    This term is very similar to terms used in radiosity form factor computations


    Computing il at a g buf pixel 3
    Computing IL at a G-buf Pixel 3

    A naive solution for smooth IL needs to consider four VPL kernels with centers at t0, t1, t2 and t3.

    stx : sub RSM texel x position [0.0, 1.0[

    sty : sub RSM texel y position [0.0, 1.0[


    Computing il at a g buf pixel 4
    Computing IL at a g-buf pixel 4

    IndirectLight =

    (1.0f-sty) * ((1.0f-stx) * + stx * ) +

    (0.0f+sty) * ((1.0f-stx) * + stx * )

    Evaluation of 4 big VPL kernels is slow 

    VPL kernel at t0

    stx : sub texel x position [0.0, 1.0[

    VPL kernel at t2

    sty : sub texel y position [0.0, 1.0[

    VPL kernel at t1

    VPL kernel at t3


    Computing il at a g buf pixel 5
    Computing IL at a g-buf pixel 5

    SmoothIndirectLight =

    (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+

    (0.0f+sty)*(((1.0f-stx)*(B6+B3)+stx*(B8+B5))+B7)+B4

    stx : sub RSM texel x position of g-buf pix [0.0, 1.0[

    sty : sub RSM texel y position of g-buf pix [0.0, 1.0[

    This trick is probably known to some of you already. See backup for a detailed explanation !



    Step 4
    Step 4

    • Indirect Light buffer is ¼ res

    • Perform a bilateral upsampling step

      • SeePeter-Pike Sloan, Naga K. Govindaraju, Derek Nowrouzezahrai, John Snyder. "Image-Based Proxy Accumulation for Real-Time Soft Global Illumination". Pacific Graphics 2007

    • Result is a full resolution IL


    Step 5
    Step 5

    • Combine

      • Direct Illumination

      • Indirect Illumination

      • Shadows (not mentioned)


    Combined image
    Combined Image

    DEMO

    ~280 FPS on a HD5970 @ 1280x1024

    for a 15x15 VPL kernel

    Deffered IL pass + bilateral upsampling costs ~2.5 ms


    How to add indirect shadows
    How to add Indirect Shadows

    • Use a CS and the linked lists technique

      • Insert blockers of IL into 3D grid of lists – let‘s use triangles for now

        • see backup for alternative data structure

    • Look at a kernel of VPLs again

    • Only accumulate light of VPLs that are occluded by blocker tris

      • Trace rays through 3d grid to detect occluded VPLs

      • Render low res buffer only

    • Subtract blocked indirect light from IL buffer

      • Subtract a blurred version of low res version

        • Blur is combined bilateral blurring/upsampling


    Insert tris into 3d grid of triangle lists
    Insert tris into 3D grid of triangle lists

    Rasterize dynamic blockers to 3D grid using a CS and atomics

    Scene


    Insert tris into 3d grid of triangle lists1
    Insert tris into 3D grid of triangle lists

    (0,1,0)

    Rasterize dynamic blockers to 3D grid using a CS and atomics

    World space 3D grid of triangle lists around IL blockers laid out in a UAV

    Scene

    eol = End of list (0xffffffff)



    Indirect light buffer1
    Indirect Light Buffer

    Blocker of green light

    Emitter of green

    light

    Expected

    indirect shadow





    Final image
    Final Image

    DEMO

    ~70 FPS on a HD5970 @ 1280x1024

    ~300 million rays per second for Indirect Shadows

    Ray casting costs ~9 ms


    Future directions
    Future directions

    • Speed up IL rendering

      • Render IL at even lower res

      • Look into multi-res RSMs

  • Speed up ray-tracing

    • Per pixel array of lists for depth buckets

    • Other data structures

  • Raytrace other primitive types

    • Splats, fuzzy ellipsoids etc.

    • Proxy geometry or bounding volumes of blockers

  • Get rid of Interlocked*() ops

    • Just mark grid cells as occupied

    • Lower quality but could work on earlier hardware

    • Need to solve self-occlusion issues


  • Q&A

    Holger Gruen [email protected]

    Nicolas Thibieroz [email protected]

    • Credits for the basic idea of how to implement PPLL under Direct3D 11 go to Jakub Klarowicz (Techland),Holger Gruen and Nicolas Thibieroz (AMD)


    ad