Oit and indirect illumination using dx11 linked lists
This presentation is the property of its rightful owner.
Sponsored Links
1 / 63

OIT and Indirect Illumination using DX11 Linked Lists PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on
  • Presentation posted in: General

OIT and Indirect Illumination using DX11 Linked Lists. Holger GruenAMD ISV Relations Nicolas ThibierozAMD ISV Relations. Agenda. Introduction Linked List Rendering Order Independent Transparency Indirect Illumination Q&A. Introduction.

Download Presentation

OIT and Indirect Illumination using DX11 Linked Lists

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Oit and indirect illumination using dx11 linked lists

OIT and Indirect Illumination using DX11 Linked Lists

Holger GruenAMD ISV Relations

Nicolas ThibierozAMD ISV Relations


Agenda

Agenda

  • Introduction

  • Linked List Rendering

  • Order Independent Transparency

  • Indirect Illumination

  • Q&A


Introduction

Introduction

  • Direct3D 11 HW opens the door to many new rendering algorithms

  • In particular per pixel linked lists allow for a number of new techniquesOIT, Indirect Shadows, Ray Tracing of dynamic scenes, REYES surface dicing, custom AA, Irregular Z-buffering, custom blending, Advanced Depth of Field, etc.

  • This talk will walk you through:A DX11 implementation of per-pixel linked list and two effects that utilize this techique

    • OIT

    • Indirect Illumination


Per pixel linked lists with direct3d 11

Per-Pixel Linked Lists with Direct3D 11

Nicolas Thibieroz

European ISV Relations

AMD

Element

Element

Element

Element

Link

Link

Link

Link


Why linked lists

Why Linked Lists?

  • Data structure useful for programming

  • Very hard to implement efficiently with previous real-time graphics APIs

  • DX11 allows efficient creation and parsing of linked lists

  • Per-pixel linked lists

    • A collection of linked lists enumerating all pixels belonging to the same screen position

Element

Element

Element

Element

Link

Link

Link

Link


Two step process

Two-step process

  • 1) Linked List Creation

    • Store incoming fragments into linked lists

  • 2) Rendering from Linked List

    • Linked List traversal and processing of stored fragments


Creating per pixel linked lists

Creating Per-Pixel Linked Lists


Ps5 0 and uavs

PS5.0 and UAVs

  • Uses a Pixel Shader 5.0 to store fragments into linked lists

    • Not a Compute Shader 5.0!

  • Uses atomic operations

  • Two UAV buffers required

    • - “Fragment & Link” buffer

    • - “Start Offset” buffer

      UAV = Unordered Access View


Fragment link buffer

Fragment & Link Buffer

  • The “Fragment & Link” buffer contains data and link for all fragments to store

  • Must be large enough to store all fragments

  • Created with Counter support

    • D3D11_BUFFER_UAV_FLAG_COUNTER flag in UAV view

  • Declaration:

    structFragmentAndLinkBuffer_STRUCT

    {

    FragmentData_STRUCTFragmentData;// Fragment data

    uintuNext;// Link to next fragment

    };

    RWStructuredBuffer <FragmentAndLinkBuffer_STRUCT> FLBuffer;


Start offset buffer

Start Offset Buffer

  • The “Start Offset” buffer contains the offset of the last fragment written at every pixel location

  • Screen-sized:(width * height * sizeof(UINT32) )

  • Initialized to magic value (e.g. -1)

    • Magic value indicates no more fragments are stored (i.e. end of the list)

  • Declaration:

    RWByteAddressBufferStartOffsetBuffer;


Linked list creation 1

Linked List Creation (1)

  • No color Render Target bound!

    • No rendering yet, just storing in L.L.

  • Depth buffer bound if needed

    • OIT will need it in a few slides

  • UAVs bounds as input/output:

    • StartOffsetBuffer (R/W)

    • FragmentAndLinkBuffer (W)


Linked list creation 2a

Linked List Creation (2a)

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

Viewport

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

0

Fragment and Link Buffer

Counter =

Fragment and Link Buffer

Fragment and Link Buffer

Fragment and Link Buffer


Linked list creation 2b

Linked List Creation (2b)

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

Viewport

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

0

Fragment and Link Buffer

Counter =

Fragment and Link Buffer

-1

Fragment and Link Buffer

-1

Fragment and Link Buffer


Linked list creation 2c

Linked List Creation (2c)

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

0

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

Viewport

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

3

2

Fragment and Link Buffer

Counter =

Fragment and Link Buffer

-1

Fragment and Link Buffer

-1

-1

Fragment and Link Buffer


Linked list creation 2d

Linked List Creation (2d)

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

0

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

Viewport

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

2

-1

-1

-1

-1

-1

-1

-1

4

3

5

Fragment and Link Buffer

Counter =

-1

-1

-1

0

Fragment and Link Buffer

-1

Fragment and Link Buffer


Linked list creation code

Linked List Creation - Code

float PS_StoreFragments(PS_INPUT input) : SV_Target

{

// Calculate fragment data (color, depth, etc.)

FragmentData_STRUCTFragmentData = ComputeFragment();

// Retrieve current pixel count and increase counter

uintuPixelCount = FLBuffer.IncrementCounter();

// Exchange offsets in StartOffsetBuffer

uintvPos = uint(input.vPos);

uintuStartOffsetAddress= 4 * ( (SCREEN_WIDTH*vPos.y) + vPos.x );

uintuOldStartOffset;

StartOffsetBuffer.InterlockedExchange(uStartOffsetAddress, uPixelCount, uOldStartOffset);

// Add new fragment entry in Fragment & Link Buffer

FragmentAndLinkBuffer_STRUCT Element;

Element.FragmentData = FragmentData;

Element.uNext = uOldStartOffset;

FLBuffer[uPixelCount] = Element;

}


Traversing per pixel linked lists

Traversing Per-Pixel Linked Lists


Rendering pixels 1

Rendering Pixels (1)

  • “Start Offset” Buffer and “Fragment & Link” Buffer now bound as SRV

    Buffer<uint> StartOffsetBufferSRV;

    StructuredBuffer<FragmentAndLinkBuffer_STRUCT> FLBufferSRV;

  • Render a fullscreen quad

  • For each pixel, parse the linked list and retrieve fragments for this screen position

  • Process list of fragments as required

    • Depends on algorithm

    • e.g. sorting, finding maximum, etc.

      SRV = Shader Resource View


Rendering from linked list

Rendering from Linked List

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

3

3

4

-1

-1

-1

-1

-1

-1

-1

-1

-1

Render Target

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

2

-1

-1

-1

-1

-1

-1

-1

Fragment and Link Buffer

-1

-1

-1

-1

0

0

Fragment and Link Buffer

-1

Fragment and Link Buffer


Rendering pixels 2

Rendering Pixels (2)

float4 PS_RenderFragments(PS_INPUT input) : SV_Target

{

// Calculate UINT-aligned start offset buffer address

uintvPos = uint(input.vPos);

uintuStartOffsetAddress = SCREEN_WIDTH*vPos.y + vPos.x;

// Fetch offset of first fragment for current pixel

uintuOffset = StartOffsetBufferSRV.Load(uStartOffsetAddress);

// Parse linked list for all fragments at this position

float4 FinalColor=float4(0,0,0,0);

while (uOffset!=0xFFFFFFFF)// 0xFFFFFFFF is magic value

{

// Retrieve pixel at current offset

Element=FLBufferSRV[uOffset];

// Process pixel as required

ProcessPixel(Element, FinalColor);

// Retrieve next offset

uOffset = Element.uNext;

}

return (FinalColor);

}


Order independent transparency via per pixel linked lists

Order-Independent Transparency via Per-Pixel Linked Lists

Nicolas Thibieroz

European ISV Relations

AMD


Description

Description

  • Straight application of the linked list algorithm

  • Stores transparent fragments into PPLL

  • Rendering phase sorts pixels in a back-to-front order and blends them manually in a pixel shader

    • Blend mode can be unique per-pixel!

  • Special case for MSAA support


Linked list structure

Linked List Structure

  • Optimize performance by reducing amount of data to write to/read from UAV

  • E.g. uint instead of float4 for color

  • Example data structure for OIT:

    structFragmentAndLinkBuffer_STRUCT

    {

    uintuPixelColor;// Packed pixel color

    uintuDepth;// Pixel depth

    uintuNext;// Address of next link

    };

  • May also get away with packed color and depth into the same uint! (if same alpha)

    • 16 bits color (565) + 16 bits depth

    • Performance/memory/quality trade-off


Visible fragments only

Visible Fragments Only!

  • Use [earlydepthstencil] in front of Linked List creation pixel shader

  • This ensures only transparent fragments that pass the depth test are stored

    • i.e. Visible fragments!

  • Allows performance savings and rendering correctness!

    [earlydepthstencil]

    float PS_StoreFragments(PS_INPUT input) : SV_Target

    {

    ...

    }


Sorting pixels

Sorting Pixels

  • Sorting in place requires R/W access to Linked List

  • Sparse memory accesses = slow!

  • Better way is to copy all pixels into array of temp registers

    • Then do the sorting

  • Temp array declaration means a hard limit on number of pixel per screen coordinates

    • Required trade-off for performance


Sorting and blending

Sorting and Blending

0.95

0.93

0.87

0.98

  • Blend fragments back to front in PS

    • Blending algorithm up to app

    • Example: SRCALPHA-INVSRCALPHA

    • Or unique per pixel! (stored in fragment data)

  • Background passed as input texture

    • Actual HW blending mode disabled

Background

color

Temp Array

Render Target

PS color

0.98

0.95

0.87

0.93

-1

34

12

0


Storing pixels for sorting

Storing Pixels for Sorting

(...)

static uint2 SortedPixels[MAX_SORTED_PIXELS];

// Parse linked list for all pixels at this position

// and store them into temp array for later sorting

intnNumPixels=0;

while (uOffset!=0xFFFFFFFF)

{

// Retrieve pixel at current offset

Element=FLBufferSRV[uOffset];

// Copy pixel data into temp array

SortedPixels[nNumPixels++]=

uint2(Element.uPixelColor, Element.uDepth);

// Retrieve next offset

[flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ?

0xFFFFFFFF : Element.uNext;

}

// Sort pixels in-place

SortPixelsInPlace(SortedPixels, nNumPixels);

(...)


Pixel blending in ps

Pixel Blending in PS

(...)

// Retrieve current color from background texture

float4 vCurrentColor=BackgroundTexture.Load(int3(vPos.xy, 0));

// Rendering pixels using SRCALPHA-INVSRCALPHA blending

for (int k=0; k<nNumPixels; k++)

{

// Retrieve next unblended furthermost pixel

float4 vPixColor= UnpackFromUint(SortedPixels[k].x);

// Manual blending between current fragment and previous one

vCurrentColor.xyz= lerp(vCurrentColor.xyz, vPixColor.xyz,

vPixColor.w);

}

// Return manually-blended color

return vCurrentColor;

}


Oit via per pixel linked lists with msaa support

OIT via Per-Pixel Linked Lists with MSAA Support


Sample coverage

Sample Coverage

  • Storing individual samples into Linked Lists requires a huge amount of memory

    • ... and performance will suffer!

  • Solution is to store transparent pixels into PPLL as before

  • But including sample coverage too!

    • Requires as many bits as MSAA mode

  • Declare SV_COVERAGE in PS structure

    struct PS_INPUT

    {

    float3 vNormal : NORMAL;

    float2 vTex : TEXCOORD;

    float4 vPos : SV_POSITION;

    uintuCoverage : SV_COVERAGE;

    }


Linked list structure1

Linked List Structure

  • Almost unchanged from previously

  • Depth is now packed into 24 bits

  • 8 Bits are used to store coverage

    structFragmentAndLinkBuffer_STRUCT

    {

    uintuPixelColor;// Packed pixel color

    uintuDepthAndCoverage;// Depth + coverage

    uintuNext;// Address of next link

    };


Sample coverage example

Sample Coverage Example

  • Third sample is covered

    • uCoverage = 0x04 (0100 in binary)

      Element.uDepthAndCoverage =

      ( In.vPos.z*(2^24-1) << 8 ) | In.uCoverage;

Pixel Center

Sample


Rendering samples 1

Rendering Samples (1)

  • Rendering phase needs to be able to write individual samples

  • Thus PS is run at sample frequency

    • Can be done by declaring SV_SAMPLEINDEX in input structure

  • Parse linked list and store pixels into temp array for later sorting

    • Similar to non-MSAA case

    • Difference is to only store sample if coverage matches sample index being rasterized


Oit and indirect illumination using dx11 linked lists

Rendering Samples (2)

static uint2 SortedPixels[MAX_SORTED_PIXELS];

// Parse linked list for all pixels at this position

// and store them into temp array for later sorting

intnNumPixels=0;

while (uOffset!=0xFFFFFFFF)

{

// Retrieve pixel at current offset

Element=FLBufferSRV[uOffset];

// Retrieve pixel coverage from linked list element

uintuCoverage=UnpackCoverage(Element.uDepthAndCoverage);

if ( uCoverage & (1<<In.uSampleIndex) )

{

// Coverage matches current sample so copy pixel

SortedPixels[nNumPixels++]=Element;

}

// Retrieve next offset

[flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ?

0xFFFFFFFF : Element.uNext;

}


Oit linked list demo

DEMO

OIT Linked List Demo


Holger gruen european isv relations amd

Holger GruenEuropean ISV Relations AMD

Direct3D 11 Indirect Illumination


Indirect illumination introduction 1

Indirect Illumination Introduction 1

  • Real-time Indirect illumination is an active research topic

  • Numerous approaches existReflective Shadow Maps (RSM) [Dachsbacher/Stammiger05]Splatting Indirect Illumination [Dachsbacher/Stammiger2006]Multi-Res Splatting of Illumination [Wyman2009]Light propagation volumes [Kapalanyan2009]Approximating Dynamic Global Illumination in Image Space [Ritschel2009] Only a few support indirect shadowsImperfect Shadow Maps [Ritschel/Grosch2008]Micro-Rendering for Scalable, Parallel Final Gathering(SSDO) [Ritschel2010] Cascaded light propagation volumes for real-time indirect illumination [Kapalanyan/Dachsbacher2010]

  • Most approaches somehow extend to multi-bounce lighting


Indirect illumination introduction 2

Indirect Illumination Introduction 2

  • This section will coverAn efficient and simple DX9-compliant RSM based implementation for smooth one bounce indirect illumination

    • Indirect shadows are ignored here

  • A Direct3D 11 technique that traces rays to compute indirect shadows

    • Part of this technique could generally be used for ray-tracing dynamic scenes


  • Indirect illumination w o indirect shadows

    Indirect Illumination w/o Indirect Shadows

    • Draw scene g-buffer

    • Draw Reflective Shadowmap (RSM)

      • RSM shows the part of the scene that recieves direct light from the light source

    • Draw Indirect Light buffer at ¼ res

      • RSM texels are used as light sources on g-buffer pixels for indirect lighting

    • Upsample Indirect Light (IL)

    • Draw final image adding IL


    Step 1

    Step 1

    • G-Buffer needs to allow reconstruction of

      • World/Camera space position

      • World/Camera space normal

      • Color/ Albedo

    • DXGI_FORMAT_R32G32B32A32_FLOAT positions may be required for precise ray queries for indirect shadows


    Step 2

    Step 2

    • RSM needs to allow reconstruction of

      • World space position

      • World space normal

      • Color/ Albedo

    • Only draw emitters of indirect light

    • DXGI_FORMAT_R32G32B32A32_FLOAT position may be required for ray precise queries for indirect shadows


    Step 3

    Step 3

    • Render a ¼ res IL as a deferred op

    • Transform g-buffer pix to RSM space

      • ->Light Space->project to RSM texel space

    • Use a kernel of RSM texels as light sources

      • RSM texels also called Virtual Point Light(VPL)

    • Kernel size depends on

      • Desired speed

      • Desired look of the effect

      • RSM resolution


    Computing il at a g buf pixel 1

    Computing IL at a G-buf Pixel 1

    Sum up contribution of all VPLs in the kernel


    Computing il at a g buf pixel 2

    Computing IL at a G-buf Pixel 2

    RSM texel/VPL

    g-buffer pixel

    This term is very similar to terms used in radiosity form factor computations


    Computing il at a g buf pixel 3

    Computing IL at a G-buf Pixel 3

    A naive solution for smooth IL needs to consider four VPL kernels with centers at t0, t1, t2 and t3.

    stx : sub RSM texel x position [0.0, 1.0[

    sty : sub RSM texel y position [0.0, 1.0[


    Computing il at a g buf pixel 4

    Computing IL at a g-buf pixel 4

    IndirectLight =

    (1.0f-sty) * ((1.0f-stx) * + stx * ) +

    (0.0f+sty) * ((1.0f-stx) * + stx * )

    Evaluation of 4 big VPL kernels is slow 

    VPL kernel at t0

    stx : sub texel x position [0.0, 1.0[

    VPL kernel at t2

    sty : sub texel y position [0.0, 1.0[

    VPL kernel at t1

    VPL kernel at t3


    Computing il at a g buf pixel 5

    Computing IL at a g-buf pixel 5

    SmoothIndirectLight =

    (1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+

    (0.0f+sty)*(((1.0f-stx)*(B6+B3)+stx*(B8+B5))+B7)+B4

    stx : sub RSM texel x position of g-buf pix [0.0, 1.0[

    sty : sub RSM texel y position of g-buf pix [0.0, 1.0[

    This trick is probably known to some of you already. See backup for a detailed explanation !


    Indirect light buffer

    Indirect Light Buffer


    Step 4

    Step 4

    • Indirect Light buffer is ¼ res

    • Perform a bilateral upsampling step

      • SeePeter-Pike Sloan, Naga K. Govindaraju, Derek Nowrouzezahrai, John Snyder. "Image-Based Proxy Accumulation for Real-Time Soft Global Illumination". Pacific Graphics 2007

    • Result is a full resolution IL


    Step 5

    Step 5

    • Combine

      • Direct Illumination

      • Indirect Illumination

      • Shadows (not mentioned)


    Combined image

    Combined Image

    DEMO

    ~280 FPS on a HD5970 @ 1280x1024

    for a 15x15 VPL kernel

    Deffered IL pass + bilateral upsampling costs ~2.5 ms


    How to add indirect shadows

    How to add Indirect Shadows

    • Use a CS and the linked lists technique

      • Insert blockers of IL into 3D grid of lists – let‘s use triangles for now

        • see backup for alternative data structure

    • Look at a kernel of VPLs again

    • Only accumulate light of VPLs that are occluded by blocker tris

      • Trace rays through 3d grid to detect occluded VPLs

      • Render low res buffer only

    • Subtract blocked indirect light from IL buffer

      • Subtract a blurred version of low res version

        • Blur is combined bilateral blurring/upsampling


    Insert tris into 3d grid of triangle lists

    Insert tris into 3D grid of triangle lists

    Rasterize dynamic blockers to 3D grid using a CS and atomics

    Scene


    Insert tris into 3d grid of triangle lists1

    Insert tris into 3D grid of triangle lists

    (0,1,0)

    Rasterize dynamic blockers to 3D grid using a CS and atomics

    World space 3D grid of triangle lists around IL blockers laid out in a UAV

    Scene

    eol = End of list (0xffffffff)


    3d grid demo

    3D Grid Demo


    Indirect light buffer1

    Indirect Light Buffer

    Blocker of green light

    Emitter of green

    light

    Expected

    indirect shadow


    Blocked indirect light

    Blocked Indirect Light


    Indirect light buffer2

    Indirect Light Buffer


    Subtracting blocked il

    Subtracting Blocked IL


    Final image

    Final Image

    DEMO

    ~70 FPS on a HD5970 @ 1280x1024

    ~300 million rays per second for Indirect Shadows

    Ray casting costs ~9 ms


    Future directions

    Future directions

    • Speed up IL rendering

      • Render IL at even lower res

      • Look into multi-res RSMs

  • Speed up ray-tracing

    • Per pixel array of lists for depth buckets

    • Other data structures

  • Raytrace other primitive types

    • Splats, fuzzy ellipsoids etc.

    • Proxy geometry or bounding volumes of blockers

  • Get rid of Interlocked*() ops

    • Just mark grid cells as occupied

    • Lower quality but could work on earlier hardware

    • Need to solve self-occlusion issues


  • Oit and indirect illumination using dx11 linked lists

    Q&A

    Holger Gruen [email protected]

    Nicolas Thibieroz [email protected]

    • Credits for the basic idea of how to implement PPLL under Direct3D 11 go to Jakub Klarowicz (Techland),Holger Gruen and Nicolas Thibieroz (AMD)


  • Login