blending and stencilling n.
Skip this Video
Download Presentation
Blending and Stencilling

Loading in 2 Seconds...

play fullscreen
1 / 70

Blending and Stencilling - PowerPoint PPT Presentation

  • Uploaded on

Blending and Stencilling. Paul Taylor 2010. Alpha Blending. Back-to-Front Disable Depth-Write We still use depth-read to clip polygons. Out of Order Polygons. Good Transparency Bad Transparency. with OIT. without OIT.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Blending and Stencilling' - noam

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
blending and stencilling

Blending and Stencilling

Paul Taylor 2010

alpha blending
Alpha Blending
  • Back-to-Front
  • Disable Depth-Write
    • We still use depth-read to clip polygons
out of order polygons
Out of Order Polygons

Good TransparencyBad Transparency

with OIT

without OIT

out of order solutions
Out of order solutions
  • Depth Peeling
    • The pixels are stripped away like an onion
      • Layer by layer
  • This requires a second depth buffer, used to store each previous ‘layer’

Layer 0

Layer 1

Layer 2

Layer 3

  • The depth buffer is stored in Projected Pixels, losing a lot of data accuracy
  • And now a pretty example:

1 layer

2 layers

3 layers

4 layers

why a teapot
Why a Teapot?

When was it created?

What is it called?

Why was it created?

why a teapot1
Why a Teapot?

When was it created? 1975

What is it called? Utah Teapot or Newell Teapot

Why was it created? Newell needed a mathematically describable object to render.

As a universally accessible object, it has proven to be a useful benchmark to date...

the actual teapot
The Actual Teapot

The Rendered Teapot

another approach to oit k buffering
Another Approach to OIT (K-Buffering)
  • This approach focuses on capturing the pixel layers as they are rendered
  • Next, utilising a full-screen shader to sort the pixels.
  • Very similar to other depth-peeling approaches
nicolas thibieroz reverse depth peeling
Nicolas Thibieroz – Reverse Depth Peeling
  • Layers are found back-to-front allowing immediate blending (great for low RAM hardware e.g. Consoles).

1. Determine furthest layer2. Fill-up depth buffer texture3. Fill-up normal and colour buffer4. Do lighting & shadowing5. Blend in back-buffer6. Go to 1 for the next layer

For large amounts of transparency, four or more passes may be required! :-S

depth peeling via a bucket sort

Depth Peeling via a Bucket Sort
weighted averages
Weighted Averages
  • A cheap-out, ignoring the need to sort layers, it simply accumulates, weighted with distance
actual blending

Actual Blending

Paul Taylor 2010

the blending flow in dx 10
The Blending Flow in Dx 10

why blending
Why Blending...

Blending has become more and more important in Video Games

It allows for more realistic and faster lighting, shadows, and translucent effects

As the uses for blending have increased, so has the complexity of the blending process

dumbing down the complexities
Dumbing down the complexities
  • In an idealised world we are dealing with 2 simple components:
    • A Colour Layer (RGB)
    • A Blending Layer (A)
  • This expands to 4 once we decide to have a source and a destination layer:
    • Source Colour Layer (sRGB)
    • Source Blending Layer (sA)
    • Destination Colour Layer (dRGB)
    • Destination Blending Layer (dA)
why is the alpha layer seperate
Why is the Alpha Layer Seperate?
  • The Alpha layer can be used in many different ways, the most obvious is as a stencilling / blending control value
  • This comes into play soon...
now we must add the actual blending function
Now we must add the actual blending function:

Known in Dx Land as D3D10_BLEND_OP:


*Reverse Subtract will subtract source 2 from source 1

this gives us some basic control over how colours are blended together
This gives us some basic control over how colours are blended together

This leaves us with the following basic equation:

PixelOut = sRGB <BlendOperation> dRGB

0.6, 0.7, 0.8 = 1.0, 1.0, 1.0 REV_SUBTRACT 0.4, 0.3, 0.2

extending this to include an alpha layer
Extending this to include an Alpha Layer
  • The alpha layer has its own blending operations, independent of the colour blending.
  • But how much of the total DX blending technique have we covered?
the blending flow in dx 101
The Blending Flow in Dx 10

Pixel = sRGB <BlendOperation> dRGB

pre blending
Pre Blending
  • This allows us to modify how each source pixel is sampled
source and destination blending
Source and Destination Blending

Now things are getting complicated

  • We can use SRC and DEST blends in BOTH the source and destination blending settings
  • It is these blends that complete the blending and give Dx 10 its power!
d3d has a lot of blends
D3D has a LOT of blends!


D3D10_BLEND_ONE = 2,


















Blend Factor set through OMSetBlendState


Second source is from Pixel Shader*

Second source is from Pixel Shader*

Second source is from Pixel Shader*

Second source is from Pixel Shader*

* Dual-source colour blending

matrix direct products
Matrix Direct Products

What we previously had:

Pixel = sRGB <BlendOperation> dRGB

Adding the new blend to both our source and destination surfaces:

Source = sRGB x sBlendOperation

Destination = dRGB x dBlendOperation

welcome to ugly town
Welcome to Ugly Town:

Pixel = (sRGB x sBlendOperation) <BlendOperation>

(dRGB x dBlendOperation)

It looks better as a part of the flow chart:

utilising these blends
Utilising these Blends

sRGB = sRGB x sBlendOperation

dRGB = dRGB x dBlendOperation

Pixel = sRGB <BlendOperation> dRGB

examples of common blending methods
Examples of Common Blending Methods
  • No Change Blending
    • sBlend = ZERO
    • dBlend = ONE
    • BlendOp = ADD

Pixel = (sRGB x 0,0,0) + (dRGB x 1,1,1)

Pixel = dRGB


Generating a depth / stencil buffer without drawing or changing the scene

examples of common blending methods1
Examples of Common Blending Methods
  • Additive Blending
    • sBlend = ONE
    • dBlend = ONE
    • BlendOp = ADD

Pixel = (sRGB x 1,1,1) + (dRGB x 1,1,1)

Pixel = sRGB + dRGB


Applying a lightmap to a texture

examples of common blending methods2
Examples of Common Blending Methods
  • Multiplicative Blending
    • sBlend = ZERO
    • dBlend = SRC_COLOR
    • BlendOp = ADD

Pixel = (sRGB x 0,0,0) + (dRGB x sRGB)

Pixel = dRGB x sRGB


A different approach to applying a lightmap to a texture

examples of common blending methods3
Examples of Common Blending Methods
  • Alpha Transpacency Blending
    • sBlend = SRC_ALPHA
    • dBlend = INV_SRC_ALPHA
    • BlendOp = ADD

Pixel = (sRGB x sA,sA,sA) + (dRGB x 1-sA,1-sA,1-sA)

If sA = 0.75

Pixel = dRGB x .75 + sRGB x .25

No intensity change, just colour variance (Range still 0 – 1)



in dx code
In Dx Code

D3D10_BLEND_DESC BlendState;

ZeroMemory(&BlendState, sizeof(D3D10_BLEND_DESC));

BlendState.BlendEnable[0] = TRUE;

BlendState.SrcBlend = D3D10_BLEND_SRC_ALPHA;

BlendState.DestBlend = D3D10_BLEND_INV_SRC_ALPHA;

BlendState.BlendOp = D3D10_BLEND_OP_ADD;

BlendState.SrcBlendAlpha = D3D10_BLEND_ZERO;

BlendState.DestBlendAlpha = D3D10_BLEND_ZERO;

BlendState.BlendOpAlpha = D3D10_BLEND_OP_ADD;

BlendState.RenderTargetWriteMask[0] = D3D10_COLOR_WRITE_ENABLE_ALL; 

pd3dDevice->CreateBlendState(&BlendState, &g_pBlendState);


Why are the following two lines arrays?

BlendState.BlendEnable[0] = TRUE;

BlendState.RenderTargetWriteMask[0] = D3D10_COLOR_WRITE_ENABLE_ALL; 

Dx 10 can have up to 8 Render Targets

Depending on exactly what you want to capture to each target, shaders can be defined per target

*It will still take 1x pass for each shader

Think about the layered images from last weeks lighting, these could be generated easily with different blends

integrating the blend struct with the flowchart
Integrating the Blend Struct with the Flowchart

BlendState.BlendEnable[0] = TRUE;

BlendState.SrcBlend = D3D10_BLEND_SRC_ALPHA;

BlendState.DestBlend = D3D10_BLEND_INV_SRC_ALPHA;

BlendState.BlendOp = D3D10_BLEND_OP_ADD;

BlendState.SrcBlendAlpha = D3D10_BLEND_ZERO;

BlendState.DestBlendAlpha = D3D10_BLEND_ZERO;

BlendState.BlendOpAlpha = D3D10_BLEND_OP_ADD;

BlendState.RenderTargetWriteMask[0] = D3D10_COLOR_WRITE_ENABLE_ALL; 

depth tests and blending
Depth Tests and Blending
  • Do we need to disable depth-testing?
    • In almost all cases yes
    • When transparency blending we need to maintain the draw order as blends are non-commutative
    • Basic Additive / Subtractive / Multiplicative blends are commutative
      • Basic being those that do not use alpha blends 
can we just disable depth testing
Can we just disable depth testing?
  • No!
  • Basically it’s still good to depth sample the existing rendered surfaces, in case out transparent objects are occluded
  • We can use a OMSetDepthStencilState to disable writing to the depth buffer, whist retaining the ability to read from the depth buffer, and reject obscured pixels 
alpha clipping enhanced
Alpha Clipping Enhanced

Typically if the Alpha of a pixel is 0.0f we can safely discard it. Right?

After a pixel goes through some different mip-mapping and scaling all your 0.0f pixels may be a little different

Adding a clip call will fix this issue:

Clip(alpha – 0.24f);

This will drop a pixel form the PS if the value is below 0.0f

utilising the stencil buffer
Utilising the Stencil Buffer

To Begin : Back to basics

How big is the Stencil Buffer?

8 bits 00 – FF.

The most obvious use of a stencil buffer would require only 1 bit.

So what of all the extra storage we have?


Multiple Stencils

We could generate up to 8 different stencils in a single pass

We could use an incremental stencil buffer to generate a depth-complexity map

(as each polygon is written, the stencil increases value form 0 towards 255)

more complex blending
More Complex Blending! 
  • At least we don’t have to deal with Pre-Blending or dual-source blending
  • But Stencilling introduces Masking
  • You already know most of the LHS


The Stencil Buffer does two jobs

  • The Stencil Test
  • Accumulating Values such as z-complexity

The stencil test is the easy part!

Once the stencil has been created.....

using the stencil buffer
Using the Stencil Buffer
  • Stencil Reference Value (StencilRef)

This is a value 0-255 passed in as the StencilState is set

The stencil comparison simplified

Stencil Ref <Comparison> PixelValue

If a pixel fails the Stencil Test it is rejected and will never make it to the depth buffer of the back buffer

So what are the possible comparisons...



  • Stencil Read Mask
    • This allows the comparison to be done on a subset of the stencil bits.

Actual comparison

Stencil Ref & Mask


PixelValue & Mask

using the test
Using the Test

that s all that you need to use the stencil buffer setting it well
That’s all that you need to USE the stencil buffer, setting it well....

Firstly we need an agenda, creating a mirror surface is an easy one:

For this we will need to mask off the area this ‘mirror’ surface covers

The easiest way to do this is to pass everything, and just render the one surface to the screen

what do we do when the stencil test passes
What do we do when the stencil test passes?

Obviously we can draw the pixel, but we also use the stencil test to write a stencil into the buffer

We have three different outcomes from the stencil test:

  • The Stencil Fails
  • The Stencil Passes, but depth fails
  • Both the Stencil and Depth pass
for each outcome there are 8 possible actions
For each outcome there are 8 possible actions



D3D10_STENCIL_OP_DECR = 8 // Wrap

in dx world
In Dx World...


BOOL DepthEnable; // Duh!

D3D10_DEPTH_WRITE_MASK DepthWriteMask; // ZERO or ALL

D3D10_COMPARISON_FUNC DepthFunc; // Depth Comparison

BOOL StencilEnable; // Stencil In use?

UINT8 StencilReadMask; // Masks

UINT8 StencilWriteMask; // “

D3D10_DEPTH_STENCILOP_DESC FrontFace; // Two structs for

D3D10_DEPTH_STENCILOP_DESC BackFace; // Stencil Updates

dx world stencil operations one for ff polys one for bf polys
Dx World Stencil Operations (One for FF Polys, one for BF Polys)


D3D10_STENCIL_OP StencilFailOp;

D3D10_STENCIL_OP StencilDepthFailOp;

D3D10_STENCIL_OP StencilPassOp;


using the test1
Using the Test

how many z fails

How many Z-Fails?
how many z fails1

How many Z-Fails?
wtf is a compute shader
WTF is a Compute Shader
  • Simplistically, it’s a way of utilising the GPUs processing power without direct association to image processing
  • Full support of D3D resources
    • No need to pull data from the GPU just to return it for rendering
    • Post-Processing is possible in the one render loop
why should you care
Why should you care?
  • CS Shaders are excellent for:
    • Image Processing (As proven by Photoshop CS5)
    • Particle Simulations
    • Advanced Rendering
      • Ray Tracing, Radiosity Lighting, Renderman
    • Game Physics
    • Game AI
    • A Buffering (Complex AA, Area-Averaged, Acc Buffer)
      • A part of Reyes (Renderman)
    • OIT (Order Independent Transparency)
what the pixel shader does
What the Pixel Shader Does
  • Millions of tiny threads
  • Each has a fixed destination (Pixel)
  • No thread communication
  • Pure Parallelisation
what the computer shader does
What the Computer Shader Does
  • Thousands of thread groups
  • Sharable Source / Destination within each group
  • Arbitrary writes to video memory
  • Sampling limit is 2GB in Dx11 (16k x 16k)
addressing the gpu as a slave client computer
Addressing the GPU as a slave (Client) Computer
  • You need to set up jobs
  • Request the job to run
  • Retrieve the data
  • Achieved through Data Parallel Processing
cs 5 0 dx 11
CS 5.0 (Dx 11+)
  • Supports cross-thread data sharing
  • Unordered IO operations
  • Irregular data structures
  • 32kb per thread group of shared memory
  • Ability to create Unordered Access Views that can be accessed by the pixel shader (1D 2D and 3D textures)

psn meets xbox live
PSN meets Xbox Live
onlive still is not dead
OnLive still is not dead!
crackers make steam powered max payne 2 possible
Crackers make Steam Powered Max Payne 2 Possible
  • A-Buffer: