Blending and stencilling
This presentation is the property of its rightful owner.
Sponsored Links
1 / 70

Blending and Stencilling PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Blending and Stencilling. Paul Taylor 2010. Alpha Blending. Back-to-Front Disable Depth-Write We still use depth-read to clip polygons. Out of Order Polygons. Good Transparency Bad Transparency. with OIT. without OIT.

Download Presentation

Blending and Stencilling

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Blending and stencilling

Blending and Stencilling

Paul Taylor 2010

Alpha blending

Alpha Blending

  • Back-to-Front

  • Disable Depth-Write

    • We still use depth-read to clip polygons

Out of order polygons

Out of Order Polygons

Good TransparencyBad Transparency

with OIT

without OIT

Out of order solutions

Out of order solutions

  • Depth Peeling

    • The pixels are stripped away like an onion

      • Layer by layer

  • This requires a second depth buffer, used to store each previous ‘layer’

Blending and stencilling

Layer 0

Layer 1

Layer 2

Layer 3



  • The depth buffer is stored in Projected Pixels, losing a lot of data accuracy

  • And now a pretty example:

Blending and stencilling

1 layer

2 layers

3 layers

4 layers

Why a teapot

Why a Teapot?

When was it created?

What is it called?

Why was it created?

Why a teapot1

Why a Teapot?

When was it created? 1975

What is it called? Utah Teapot or Newell Teapot

Why was it created? Newell needed a mathematically describable object to render.

As a universally accessible object, it has proven to be a useful benchmark to date...

The actual teapot

The Actual Teapot

The Rendered Teapot

Another approach to oit k buffering

Another Approach to OIT (K-Buffering)

  • This approach focuses on capturing the pixel layers as they are rendered

  • Next, utilising a full-screen shader to sort the pixels.

  • Very similar to other depth-peeling approaches

Nicolas thibieroz reverse depth peeling

Nicolas Thibieroz – Reverse Depth Peeling

  • Layers are found back-to-front allowing immediate blending (great for low RAM hardware e.g. Consoles).

    1. Determine furthest layer2. Fill-up depth buffer texture3. Fill-up normal and colour buffer4. Do lighting & shadowing5. Blend in back-buffer6. Go to 1 for the next layer

    For large amounts of transparency, four or more passes may be required! :-S

Depth peeling via a bucket sort

Depth Peeling via a Bucket Sort

Weighted averages

Weighted Averages


  • A cheap-out, ignoring the need to sort layers, it simply accumulates, weighted with distance

Actual blending

Actual Blending

Paul Taylor 2010

Blending and stencilling

The blending flow in dx 10

The Blending Flow in Dx 10

Why blending

Why Blending...

Blending has become more and more important in Video Games

It allows for more realistic and faster lighting, shadows, and translucent effects

As the uses for blending have increased, so has the complexity of the blending process

Dumbing down the complexities

Dumbing down the complexities

  • In an idealised world we are dealing with 2 simple components:

    • A Colour Layer (RGB)

    • A Blending Layer (A)

  • This expands to 4 once we decide to have a source and a destination layer:

    • Source Colour Layer (sRGB)

    • Source Blending Layer (sA)

    • Destination Colour Layer (dRGB)

    • Destination Blending Layer (dA)

Why is the alpha layer seperate

Why is the Alpha Layer Seperate?

  • The Alpha layer can be used in many different ways, the most obvious is as a stencilling / blending control value

  • This comes into play soon...

Now we must add the actual blending function

Now we must add the actual blending function:

Known in Dx Land as D3D10_BLEND_OP:


*Reverse Subtract will subtract source 2 from source 1

This gives us some basic control over how colours are blended together

This gives us some basic control over how colours are blended together

This leaves us with the following basic equation:

PixelOut = sRGB <BlendOperation> dRGB

0.6, 0.7, 0.8 = 1.0, 1.0, 1.0 REV_SUBTRACT 0.4, 0.3, 0.2

Extending this to include an alpha layer

Extending this to include an Alpha Layer

  • The alpha layer has its own blending operations, independent of the colour blending.

  • But how much of the total DX blending technique have we covered?

The blending flow in dx 101

The Blending Flow in Dx 10

Pixel = sRGB <BlendOperation> dRGB

Pre blending

Pre Blending

  • This allows us to modify how each source pixel is sampled

Source and destination blending

Source and Destination Blending

Now things are getting complicated

  • We can use SRC and DEST blends in BOTH the source and destination blending settings

  • It is these blends that complete the blending and give Dx 10 its power!

D3d has a lot of blends

D3D has a LOT of blends!


D3D10_BLEND_ONE = 2,


















Blend Factor set through OMSetBlendState


Second source is from Pixel Shader*

Second source is from Pixel Shader*

Second source is from Pixel Shader*

Second source is from Pixel Shader*

* Dual-source colour blending

Matrix direct products

Matrix Direct Products

What we previously had:

Pixel = sRGB <BlendOperation> dRGB

Adding the new blend to both our source and destination surfaces:

Source = sRGB x sBlendOperation

Destination = dRGB x dBlendOperation

Welcome to ugly town

Welcome to Ugly Town:

Pixel = (sRGB x sBlendOperation) <BlendOperation>

(dRGB x dBlendOperation)

It looks better as a part of the flow chart:

Utilising these blends

Utilising these Blends

sRGB = sRGB x sBlendOperation

dRGB = dRGB x dBlendOperation

Pixel = sRGB <BlendOperation> dRGB

Examples of common blending methods

Examples of Common Blending Methods

  • No Change Blending

    • sBlend = ZERO

    • dBlend = ONE

    • BlendOp = ADD

      Pixel = (sRGB x 0,0,0) + (dRGB x 1,1,1)

      Pixel = dRGB


      Generating a depth / stencil buffer without drawing or changing the scene

Examples of common blending methods1

Examples of Common Blending Methods

  • Additive Blending

    • sBlend = ONE

    • dBlend = ONE

    • BlendOp = ADD

      Pixel = (sRGB x 1,1,1) + (dRGB x 1,1,1)

      Pixel = sRGB + dRGB


      Applying a lightmap to a texture

Examples of common blending methods2

Examples of Common Blending Methods

  • Multiplicative Blending

    • sBlend = ZERO

    • dBlend = SRC_COLOR

    • BlendOp = ADD

      Pixel = (sRGB x 0,0,0) + (dRGB x sRGB)

      Pixel = dRGB x sRGB


      A different approach to applying a lightmap to a texture

Examples of common blending methods3

Examples of Common Blending Methods

  • Alpha Transpacency Blending

    • sBlend = SRC_ALPHA

    • dBlend = INV_SRC_ALPHA

    • BlendOp = ADD

      Pixel = (sRGB x sA,sA,sA) + (dRGB x 1-sA,1-sA,1-sA)

      If sA = 0.75

      Pixel = dRGB x .75 + sRGB x .25

      No intensity change, just colour variance (Range still 0 – 1)



In dx code

In Dx Code

D3D10_BLEND_DESC BlendState;

ZeroMemory(&BlendState, sizeof(D3D10_BLEND_DESC));

BlendState.BlendEnable[0] = TRUE;

BlendState.SrcBlend = D3D10_BLEND_SRC_ALPHA;

BlendState.DestBlend = D3D10_BLEND_INV_SRC_ALPHA;

BlendState.BlendOp = D3D10_BLEND_OP_ADD;

BlendState.SrcBlendAlpha = D3D10_BLEND_ZERO;

BlendState.DestBlendAlpha = D3D10_BLEND_ZERO;

BlendState.BlendOpAlpha = D3D10_BLEND_OP_ADD;

BlendState.RenderTargetWriteMask[0] = D3D10_COLOR_WRITE_ENABLE_ALL; 

pd3dDevice->CreateBlendState(&BlendState, &g_pBlendState);

Blending and stencilling

Why are the following two lines arrays?

BlendState.BlendEnable[0] = TRUE;

BlendState.RenderTargetWriteMask[0] = D3D10_COLOR_WRITE_ENABLE_ALL; 

Dx 10 can have up to 8 Render Targets

Depending on exactly what you want to capture to each target, shaders can be defined per target

*It will still take 1x pass for each shader

Think about the layered images from last weeks lighting, these could be generated easily with different blends

Integrating the blend struct with the flowchart

Integrating the Blend Struct with the Flowchart

BlendState.BlendEnable[0] = TRUE;

BlendState.SrcBlend = D3D10_BLEND_SRC_ALPHA;

BlendState.DestBlend = D3D10_BLEND_INV_SRC_ALPHA;

BlendState.BlendOp = D3D10_BLEND_OP_ADD;

BlendState.SrcBlendAlpha = D3D10_BLEND_ZERO;

BlendState.DestBlendAlpha = D3D10_BLEND_ZERO;

BlendState.BlendOpAlpha = D3D10_BLEND_OP_ADD;

BlendState.RenderTargetWriteMask[0] = D3D10_COLOR_WRITE_ENABLE_ALL; 

Depth tests and blending

Depth Tests and Blending

  • Do we need to disable depth-testing?

    • In almost all cases yes

    • When transparency blending we need to maintain the draw order as blends are non-commutative

    • Basic Additive / Subtractive / Multiplicative blends are commutative

      • Basic being those that do not use alpha blends 

Can we just disable depth testing

Can we just disable depth testing?

  • No!

  • Basically it’s still good to depth sample the existing rendered surfaces, in case out transparent objects are occluded

  • We can use a OMSetDepthStencilState to disable writing to the depth buffer, whist retaining the ability to read from the depth buffer, and reject obscured pixels 

Alpha clipping enhanced

Alpha Clipping Enhanced

Typically if the Alpha of a pixel is 0.0f we can safely discard it. Right?

After a pixel goes through some different mip-mapping and scaling all your 0.0f pixels may be a little different

Adding a clip call will fix this issue:

Clip(alpha – 0.24f);

This will drop a pixel form the PS if the value is below 0.0f

Stencilling part of the output merger stage

StencillingPart of the Output-Merger Stage

Paul Taylor 2010

Utilising the stencil buffer

Utilising the Stencil Buffer

To Begin : Back to basics

How big is the Stencil Buffer?

8 bits 00 – FF.

The most obvious use of a stencil buffer would require only 1 bit.

So what of all the extra storage we have?

Blending and stencilling

Multiple Stencils

We could generate up to 8 different stencils in a single pass

We could use an incremental stencil buffer to generate a depth-complexity map

(as each polygon is written, the stencil increases value form 0 towards 255)

More complex blending

More Complex Blending! 

  • At least we don’t have to deal with Pre-Blending or dual-source blending

  • But Stencilling introduces Masking



  • You already know most of the LHS



The Stencil Buffer does two jobs

  • The Stencil Test

  • Accumulating Values such as z-complexity

    The stencil test is the easy part!

    Once the stencil has been created.....

Using the stencil buffer

Using the Stencil Buffer

  • Stencil Reference Value (StencilRef)

    This is a value 0-255 passed in as the StencilState is set

    The stencil comparison simplified

    Stencil Ref <Comparison> PixelValue

    If a pixel fails the Stencil Test it is rejected and will never make it to the depth buffer of the back buffer

    So what are the possible comparisons...






  • Stencil Read Mask

    • This allows the comparison to be done on a subset of the stencil bits.

      Actual comparison

      Stencil Ref & Mask


      PixelValue & Mask

Using the test

Using the Test

That s all that you need to use the stencil buffer setting it well

That’s all that you need to USE the stencil buffer, setting it well....

Firstly we need an agenda, creating a mirror surface is an easy one:

For this we will need to mask off the area this ‘mirror’ surface covers

The easiest way to do this is to pass everything, and just render the one surface to the screen

What do we do when the stencil test passes

What do we do when the stencil test passes?

Obviously we can draw the pixel, but we also use the stencil test to write a stencil into the buffer

We have three different outcomes from the stencil test:

  • The Stencil Fails

  • The Stencil Passes, but depth fails

  • Both the Stencil and Depth pass

For each outcome there are 8 possible actions

For each outcome there are 8 possible actions



D3D10_STENCIL_OP_DECR = 8 // Wrap

In dx world

In Dx World...


BOOL DepthEnable; // Duh!

D3D10_DEPTH_WRITE_MASK DepthWriteMask; // ZERO or ALL

D3D10_COMPARISON_FUNC DepthFunc; // Depth Comparison

BOOL StencilEnable; // Stencil In use?

UINT8 StencilReadMask; // Masks

UINT8 StencilWriteMask; // “

D3D10_DEPTH_STENCILOP_DESC FrontFace; // Two structs for

D3D10_DEPTH_STENCILOP_DESC BackFace; // Stencil Updates

Dx world stencil operations one for ff polys one for bf polys

Dx World Stencil Operations (One for FF Polys, one for BF Polys)


D3D10_STENCIL_OP StencilFailOp;

D3D10_STENCIL_OP StencilDepthFailOp;

D3D10_STENCIL_OP StencilPassOp;


Using the test1

Using the Test

How many z fails

How many Z-Fails?

How many z fails1

How many Z-Fails?

Compute shaders

Compute Shaders

Wtf is a compute shader

WTF is a Compute Shader

  • Simplistically, it’s a way of utilising the GPUs processing power without direct association to image processing

  • Full support of D3D resources

    • No need to pull data from the GPU just to return it for rendering

    • Post-Processing is possible in the one render loop

Why should you care

Why should you care?

  • CS Shaders are excellent for:

    • Image Processing (As proven by Photoshop CS5)

    • Particle Simulations

    • Advanced Rendering

      • Ray Tracing, Radiosity Lighting, Renderman

    • Game Physics

    • Game AI

    • A Buffering (Complex AA, Area-Averaged, Acc Buffer)

      • A part of Reyes (Renderman)

    • OIT (Order Independent Transparency)

What the pixel shader does

What the Pixel Shader Does

  • Millions of tiny threads

  • Each has a fixed destination (Pixel)

  • No thread communication

  • Pure Parallelisation

What the computer shader does

What the Computer Shader Does

  • Thousands of thread groups

  • Sharable Source / Destination within each group

  • Arbitrary writes to video memory

  • Sampling limit is 2GB in Dx11 (16k x 16k)

Addressing the gpu as a slave client computer

Addressing the GPU as a slave (Client) Computer

  • You need to set up jobs

  • Request the job to run

  • Retrieve the data



  • Achieved through Data Parallel Processing

Cs 5 0 dx 11

CS 5.0 (Dx 11+)

  • Supports cross-thread data sharing

  • Unordered IO operations

  • Irregular data structures

  • 32kb per thread group of shared memory

  • Ability to create Unordered Access Views that can be accessed by the pixel shader (1D 2D and 3D textures)

Psn meets xbox live

PSN meets Xbox Live


Onlive still is not dead

OnLive still is not dead!


Crackers make steam powered max payne 2 possible

Crackers make Steam Powered Max Payne 2 Possible





  • A-Buffer:




  • Login