slide1
Download
Skip this Video
Download Presentation
OIT and Indirect Illumination using DX11 Linked Lists

Loading in 2 Seconds...

play fullscreen
1 / 63

OIT and Indirect Illumination using DX11 Linked Lists - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

OIT and Indirect Illumination using DX11 Linked Lists. Holger Gruen AMD ISV Relations Nicolas Thibieroz AMD ISV Relations. Agenda. Introduction Linked List Rendering Order Independent Transparency Indirect Illumination Q&A. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' OIT and Indirect Illumination using DX11 Linked Lists' - germaine-vega


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
oit and indirect illumination using dx11 linked lists
OIT and Indirect Illumination using DX11 Linked Lists

Holger Gruen AMD ISV Relations

Nicolas Thibieroz AMD ISV Relations

agenda
Agenda
  • Introduction
  • Linked List Rendering
  • Order Independent Transparency
  • Indirect Illumination
  • Q&A
introduction
Introduction
  • Direct3D 11 HW opens the door to many new rendering algorithms
  • In particular per pixel linked lists allow for a number of new techniquesOIT, Indirect Shadows, Ray Tracing of dynamic scenes, REYES surface dicing, custom AA, Irregular Z-buffering, custom blending, Advanced Depth of Field, etc.
  • This talk will walk you through:A DX11 implementation of per-pixel linked list and two effects that utilize this techique
    • OIT
    • Indirect Illumination
per pixel linked lists with direct3d 11
Per-Pixel Linked Lists with Direct3D 11

Nicolas Thibieroz

European ISV Relations

AMD

Element

Element

Element

Element

Link

Link

Link

Link

why linked lists
Why Linked Lists?
  • Data structure useful for programming
  • Very hard to implement efficiently with previous real-time graphics APIs
  • DX11 allows efficient creation and parsing of linked lists
  • Per-pixel linked lists
    • A collection of linked lists enumerating all pixels belonging to the same screen position

Element

Element

Element

Element

Link

Link

Link

Link

two step process
Two-step process
  • 1) Linked List Creation
    • Store incoming fragments into linked lists
  • 2) Rendering from Linked List
    • Linked List traversal and processing of stored fragments
ps5 0 and uavs
PS5.0 and UAVs
  • Uses a Pixel Shader 5.0 to store fragments into linked lists
    • Not a Compute Shader 5.0!
  • Uses atomic operations
  • Two UAV buffers required
    • - “Fragment & Link” buffer
    • - “Start Offset” buffer

UAV = Unordered Access View

fragment link buffer
Fragment & Link Buffer
  • The “Fragment & Link” buffer contains data and link for all fragments to store
  • Must be large enough to store all fragments
  • Created with Counter support
    • D3D11_BUFFER_UAV_FLAG_COUNTER flag in UAV view
  • Declaration:

structFragmentAndLinkBuffer_STRUCT

{

FragmentData_STRUCTFragmentData; // Fragment data

uintuNext; // Link to next fragment

};

RWStructuredBuffer <FragmentAndLinkBuffer_STRUCT> FLBuffer;

start offset buffer
Start Offset Buffer
  • The “Start Offset” buffer contains the offset of the last fragment written at every pixel location
  • Screen-sized:(width * height * sizeof(UINT32) )
  • Initialized to magic value (e.g. -1)
    • Magic value indicates no more fragments are stored (i.e. end of the list)
  • Declaration:

RWByteAddressBufferStartOffsetBuffer;

linked list creation 1
Linked List Creation (1)
  • No color Render Target bound!
    • No rendering yet, just storing in L.L.
  • Depth buffer bound if needed
    • OIT will need it in a few slides
  • UAVs bounds as input/output:
    • StartOffsetBuffer (R/W)
    • FragmentAndLinkBuffer (W)
linked list creation 2a
Linked List Creation (2a)

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

Viewport

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

0

Fragment and Link Buffer

Counter =

Fragment and Link Buffer

Fragment and Link Buffer

Fragment and Link Buffer

linked list creation 2b
Linked List Creation (2b)

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

Viewport

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

0

Fragment and Link Buffer

Counter =

Fragment and Link Buffer

-1

Fragment and Link Buffer

-1

Fragment and Link Buffer

linked list creation 2c
Linked List Creation (2c)

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

0

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

Viewport

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

3

2

Fragment and Link Buffer

Counter =

Fragment and Link Buffer

-1

Fragment and Link Buffer

-1

-1

Fragment and Link Buffer

linked list creation 2d
Linked List Creation (2d)

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

0

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

Viewport

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

2

-1

-1

-1

-1

-1

-1

-1

4

3

5

Fragment and Link Buffer

Counter =

-1

-1

-1

0

Fragment and Link Buffer

-1

Fragment and Link Buffer

linked list creation code
Linked List Creation - Code

float PS_StoreFragments(PS_INPUT input) : SV_Target

{

// Calculate fragment data (color, depth, etc.)

FragmentData_STRUCTFragmentData = ComputeFragment();

// Retrieve current pixel count and increase counter

uintuPixelCount = FLBuffer.IncrementCounter();

// Exchange offsets in StartOffsetBuffer

uintvPos = uint(input.vPos);

uintuStartOffsetAddress= 4 * ( (SCREEN_WIDTH*vPos.y) + vPos.x );

uintuOldStartOffset;

StartOffsetBuffer.InterlockedExchange(uStartOffsetAddress, uPixelCount, uOldStartOffset);

// Add new fragment entry in Fragment & Link Buffer

FragmentAndLinkBuffer_STRUCT Element;

Element.FragmentData = FragmentData;

Element.uNext = uOldStartOffset;

FLBuffer[uPixelCount] = Element;

}

rendering pixels 1
Rendering Pixels (1)
  • “Start Offset” Buffer and “Fragment & Link” Buffer now bound as SRV

Buffer<uint> StartOffsetBufferSRV;

StructuredBuffer<FragmentAndLinkBuffer_STRUCT> FLBufferSRV;

  • Render a fullscreen quad
  • For each pixel, parse the linked list and retrieve fragments for this screen position
  • Process list of fragments as required
    • Depends on algorithm
    • e.g. sorting, finding maximum, etc.

SRV = Shader Resource View

rendering from linked list
Rendering from Linked List

Start Offset Buffer

-1

-1

-1

-1

-1

-1

-1

3

3

4

-1

-1

-1

-1

-1

-1

-1

-1

-1

Render Target

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

2

-1

-1

-1

-1

-1

-1

-1

Fragment and Link Buffer

-1

-1

-1

-1

0

0

Fragment and Link Buffer

-1

Fragment and Link Buffer

rendering pixels 2
Rendering Pixels (2)

float4 PS_RenderFragments(PS_INPUT input) : SV_Target

{

// Calculate UINT-aligned start offset buffer address

uintvPos = uint(input.vPos);

uintuStartOffsetAddress = SCREEN_WIDTH*vPos.y + vPos.x;

// Fetch offset of first fragment for current pixel

uintuOffset = StartOffsetBufferSRV.Load(uStartOffsetAddress);

// Parse linked list for all fragments at this position

float4 FinalColor=float4(0,0,0,0);

while (uOffset!=0xFFFFFFFF) // 0xFFFFFFFF is magic value

{

// Retrieve pixel at current offset

Element=FLBufferSRV[uOffset];

// Process pixel as required

ProcessPixel(Element, FinalColor);

// Retrieve next offset

uOffset = Element.uNext;

}

return (FinalColor);

}

order independent transparency via per pixel linked lists
Order-Independent Transparency via Per-Pixel Linked Lists

Nicolas Thibieroz

European ISV Relations

AMD

description
Description
  • Straight application of the linked list algorithm
  • Stores transparent fragments into PPLL
  • Rendering phase sorts pixels in a back-to-front order and blends them manually in a pixel shader
    • Blend mode can be unique per-pixel!
  • Special case for MSAA support
linked list structure
Linked List Structure
  • Optimize performance by reducing amount of data to write to/read from UAV
  • E.g. uint instead of float4 for color
  • Example data structure for OIT:

structFragmentAndLinkBuffer_STRUCT

{

uintuPixelColor; // Packed pixel color

uintuDepth; // Pixel depth

uintuNext; // Address of next link

};

  • May also get away with packed color and depth into the same uint! (if same alpha)
    • 16 bits color (565) + 16 bits depth
    • Performance/memory/quality trade-off
visible fragments only
Visible Fragments Only!
  • Use [earlydepthstencil] in front of Linked List creation pixel shader
  • This ensures only transparent fragments that pass the depth test are stored
    • i.e. Visible fragments!
  • Allows performance savings and rendering correctness!

[earlydepthstencil]

float PS_StoreFragments(PS_INPUT input) : SV_Target

{

...

}

sorting pixels
Sorting Pixels
  • Sorting in place requires R/W access to Linked List
  • Sparse memory accesses = slow!
  • Better way is to copy all pixels into array of temp registers
    • Then do the sorting
  • Temp array declaration means a hard limit on number of pixel per screen coordinates
    • Required trade-off for performance
sorting and blending
Sorting and Blending

0.95

0.93

0.87

0.98

  • Blend fragments back to front in PS
    • Blending algorithm up to app
    • Example: SRCALPHA-INVSRCALPHA
    • Or unique per pixel! (stored in fragment data)
  • Background passed as input texture
    • Actual HW blending mode disabled

Background

color

Temp Array

Render Target

PS color

0.98

0.95

0.87

0.93

-1

34

12

0

storing pixels for sorting
Storing Pixels for Sorting

(...)

static uint2 SortedPixels[MAX_SORTED_PIXELS];

// Parse linked list for all pixels at this position

// and store them into temp array for later sorting

intnNumPixels=0;

while (uOffset!=0xFFFFFFFF)

{

// Retrieve pixel at current offset

Element=FLBufferSRV[uOffset];

// Copy pixel data into temp array

SortedPixels[nNumPixels++]=

uint2(Element.uPixelColor, Element.uDepth);

// Retrieve next offset

[flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ?

0xFFFFFFFF : Element.uNext;

}

// Sort pixels in-place

SortPixelsInPlace(SortedPixels, nNumPixels);

(...)

pixel blending in ps
Pixel Blending in PS

(...)

// Retrieve current color from background texture

float4 vCurrentColor=BackgroundTexture.Load(int3(vPos.xy, 0));

// Rendering pixels using SRCALPHA-INVSRCALPHA blending

for (int k=0; k<nNumPixels; k++)

{

// Retrieve next unblended furthermost pixel

float4 vPixColor= UnpackFromUint(SortedPixels[k].x);

// Manual blending between current fragment and previous one

vCurrentColor.xyz= lerp(vCurrentColor.xyz, vPixColor.xyz,

vPixColor.w);

}

// Return manually-blended color

return vCurrentColor;

}

sample coverage
Sample Coverage
  • Storing individual samples into Linked Lists requires a huge amount of memory
    • ... and performance will suffer!
  • Solution is to store transparent pixels into PPLL as before
  • But including sample coverage too!
    • Requires as many bits as MSAA mode
  • Declare SV_COVERAGE in PS structure

struct PS_INPUT

{

float3 vNormal : NORMAL;

float2 vTex : TEXCOORD;

float4 vPos : SV_POSITION;

uintuCoverage : SV_COVERAGE;

}

linked list structure1
Linked List Structure
  • Almost unchanged from previously
  • Depth is now packed into 24 bits
  • 8 Bits are used to store coverage

structFragmentAndLinkBuffer_STRUCT

{

uintuPixelColor; // Packed pixel color

uintuDepthAndCoverage; // Depth + coverage

uintuNext; // Address of next link

};

sample coverage example
Sample Coverage Example
  • Third sample is covered
    • uCoverage = 0x04 (0100 in binary)

Element.uDepthAndCoverage =

( In.vPos.z*(2^24-1) << 8 ) | In.uCoverage;

Pixel Center

Sample

rendering samples 1
Rendering Samples (1)
  • Rendering phase needs to be able to write individual samples
  • Thus PS is run at sample frequency
    • Can be done by declaring SV_SAMPLEINDEX in input structure
  • Parse linked list and store pixels into temp array for later sorting
    • Similar to non-MSAA case
    • Difference is to only store sample if coverage matches sample index being rasterized
slide35

Rendering Samples (2)

static uint2 SortedPixels[MAX_SORTED_PIXELS];

// Parse linked list for all pixels at this position

// and store them into temp array for later sorting

intnNumPixels=0;

while (uOffset!=0xFFFFFFFF)

{

// Retrieve pixel at current offset

Element=FLBufferSRV[uOffset];

// Retrieve pixel coverage from linked list element

uintuCoverage=UnpackCoverage(Element.uDepthAndCoverage);

if ( uCoverage & (1<<In.uSampleIndex) )

{

// Coverage matches current sample so copy pixel

SortedPixels[nNumPixels++]=Element;

}

// Retrieve next offset

[flatten]uOffset = (nNumPixels>=MAX_SORTED_PIXELS) ?

0xFFFFFFFF : Element.uNext;

}

holger gruen european isv relations amd
Holger GruenEuropean ISV Relations AMD

Direct3D 11 Indirect Illumination

indirect illumination introduction 1
Indirect Illumination Introduction 1
  • Real-time Indirect illumination is an active research topic
  • Numerous approaches existReflective Shadow Maps (RSM) [Dachsbacher/Stammiger05]Splatting Indirect Illumination [Dachsbacher/Stammiger2006]Multi-Res Splatting of Illumination [Wyman2009]Light propagation volumes [Kapalanyan2009]Approximating Dynamic Global Illumination in Image Space [Ritschel2009] Only a few support indirect shadowsImperfect Shadow Maps [Ritschel/Grosch2008]Micro-Rendering for Scalable, Parallel Final Gathering(SSDO) [Ritschel2010] Cascaded light propagation volumes for real-time indirect illumination [Kapalanyan/Dachsbacher2010]
  • Most approaches somehow extend to multi-bounce lighting
indirect illumination introduction 2
Indirect Illumination Introduction 2
  • This section will coverAn efficient and simple DX9-compliant RSM based implementation for smooth one bounce indirect illumination
      • Indirect shadows are ignored here
  • A Direct3D 11 technique that traces rays to compute indirect shadows
      • Part of this technique could generally be used for ray-tracing dynamic scenes
indirect illumination w o indirect shadows
Indirect Illumination w/o Indirect Shadows
  • Draw scene g-buffer
  • Draw Reflective Shadowmap (RSM)
    • RSM shows the part of the scene that recieves direct light from the light source
  • Draw Indirect Light buffer at ¼ res
    • RSM texels are used as light sources on g-buffer pixels for indirect lighting
  • Upsample Indirect Light (IL)
  • Draw final image adding IL
step 1
Step 1
  • G-Buffer needs to allow reconstruction of
    • World/Camera space position
    • World/Camera space normal
    • Color/ Albedo
  • DXGI_FORMAT_R32G32B32A32_FLOAT positions may be required for precise ray queries for indirect shadows
step 2
Step 2
  • RSM needs to allow reconstruction of
    • World space position
    • World space normal
    • Color/ Albedo
  • Only draw emitters of indirect light
  • DXGI_FORMAT_R32G32B32A32_FLOAT position may be required for ray precise queries for indirect shadows
step 3
Step 3
  • Render a ¼ res IL as a deferred op
  • Transform g-buffer pix to RSM space
    • ->Light Space->project to RSM texel space
  • Use a kernel of RSM texels as light sources
    • RSM texels also called Virtual Point Light(VPL)
  • Kernel size depends on
    • Desired speed
    • Desired look of the effect
    • RSM resolution
computing il at a g buf pixel 1
Computing IL at a G-buf Pixel 1

Sum up contribution of all VPLs in the kernel

computing il at a g buf pixel 2
Computing IL at a G-buf Pixel 2

RSM texel/VPL

g-buffer pixel

This term is very similar to terms used in radiosity form factor computations

computing il at a g buf pixel 3
Computing IL at a G-buf Pixel 3

A naive solution for smooth IL needs to consider four VPL kernels with centers at t0, t1, t2 and t3.

stx : sub RSM texel x position [0.0, 1.0[

sty : sub RSM texel y position [0.0, 1.0[

computing il at a g buf pixel 4
Computing IL at a g-buf pixel 4

IndirectLight =

(1.0f-sty) * ((1.0f-stx) * + stx * ) +

(0.0f+sty) * ((1.0f-stx) * + stx * )

Evaluation of 4 big VPL kernels is slow 

VPL kernel at t0

stx : sub texel x position [0.0, 1.0[

VPL kernel at t2

sty : sub texel y position [0.0, 1.0[

VPL kernel at t1

VPL kernel at t3

computing il at a g buf pixel 5
Computing IL at a g-buf pixel 5

SmoothIndirectLight =

(1.0f-sty)*(((1.0f-stx)*(B0+B3)+stx*(B2+B5))+B1)+

(0.0f+sty)*(((1.0f-stx)*(B6+B3)+stx*(B8+B5))+B7)+B4

stx : sub RSM texel x position of g-buf pix [0.0, 1.0[

sty : sub RSM texel y position of g-buf pix [0.0, 1.0[

This trick is probably known to some of you already. See backup for a detailed explanation !

step 4
Step 4
  • Indirect Light buffer is ¼ res
  • Perform a bilateral upsampling step
    • SeePeter-Pike Sloan, Naga K. Govindaraju, Derek Nowrouzezahrai, John Snyder. "Image-Based Proxy Accumulation for Real-Time Soft Global Illumination". Pacific Graphics 2007
  • Result is a full resolution IL
step 5
Step 5
  • Combine
    • Direct Illumination
    • Indirect Illumination
    • Shadows (not mentioned)
combined image
Combined Image

DEMO

~280 FPS on a HD5970 @ 1280x1024

for a 15x15 VPL kernel

Deffered IL pass + bilateral upsampling costs ~2.5 ms

how to add indirect shadows
How to add Indirect Shadows
  • Use a CS and the linked lists technique
    • Insert blockers of IL into 3D grid of lists – let‘s use triangles for now
      • see backup for alternative data structure
  • Look at a kernel of VPLs again
  • Only accumulate light of VPLs that are occluded by blocker tris
    • Trace rays through 3d grid to detect occluded VPLs
    • Render low res buffer only
  • Subtract blocked indirect light from IL buffer
    • Subtract a blurred version of low res version
      • Blur is combined bilateral blurring/upsampling
insert tris into 3d grid of triangle lists
Insert tris into 3D grid of triangle lists

Rasterize dynamic blockers to 3D grid using a CS and atomics

Scene

insert tris into 3d grid of triangle lists1
Insert tris into 3D grid of triangle lists

(0,1,0)

Rasterize dynamic blockers to 3D grid using a CS and atomics

World space 3D grid of triangle lists around IL blockers laid out in a UAV

Scene

eol = End of list (0xffffffff)

indirect light buffer1
Indirect Light Buffer

Blocker of green light

Emitter of green

light

Expected

indirect shadow

final image
Final Image

DEMO

~70 FPS on a HD5970 @ 1280x1024

~300 million rays per second for Indirect Shadows

Ray casting costs ~9 ms

future directions
Future directions
  • Speed up IL rendering
      • Render IL at even lower res
      • Look into multi-res RSMs
  • Speed up ray-tracing
      • Per pixel array of lists for depth buckets
      • Other data structures
  • Raytrace other primitive types
      • Splats, fuzzy ellipsoids etc.
      • Proxy geometry or bounding volumes of blockers
  • Get rid of Interlocked*() ops
      • Just mark grid cells as occupied
      • Lower quality but could work on earlier hardware
      • Need to solve self-occlusion issues
slide63
Q&A

Holger Gruen holger.gruen@AMD.com

Nicolas Thibieroz nicolas.thibieroz@AMD.com

  • Credits for the basic idea of how to implement PPLL under Direct3D 11 go to Jakub Klarowicz (Techland),Holger Gruen and Nicolas Thibieroz (AMD)
ad