Stencil routed a buffer
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Stencil Routed A-Buffer PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on
  • Presentation posted in: General

Stencil Routed A-Buffer. Kevin Myers and Louis Bavoil NVIDIA. Our Cool Thing. What is it?. A-Buffer Simply a list of fragments per-pixel “The A-buffer, an antialiased hidden surface method” [Carpenter 84] Related Work Depth Peeling [Mammen 89] [Everitt 01] k-Buffer [Bavoil et al. 07].

Download Presentation

Stencil Routed A-Buffer

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Stencil routed a buffer

Stencil Routed A-Buffer

Kevin Myers and Louis Bavoil

NVIDIA


Our cool thing

Our Cool Thing


What is it

What is it?

  • A-Buffer

    • Simply a list of fragments per-pixel

      • “The A-buffer, an antialiased hidden surface method” [Carpenter 84]

  • Related Work

    • Depth Peeling [Mammen 89] [Everitt 01]

    • k-Buffer [Bavoil et al. 07]


Why do i need this

Why do I need this?

  • Often want more than nearest

    • Alpha blending

    • Volume rendering

    • Collision detection

    • Refraction and caustics

    • Global illumination


Why is it hard

Why is it hard?

  • GPU’s optimized to capture nearest layer

    • Z buffering and early z test

    • Fine for most real-time lighting models

    • Wasteful if not rendering front to back


Things that don t work

Things that don’t work

  • Blending can’t just turn of z-buffering

    • Most operations non-commutative

  • MRT

    • Can’t direct output

  • Reading what you’re writing

    • Hazardous

      • “Multi-Layer Depth Peeling via Fragment Sort” [Liu et al. 06]

      • k-Buffer [Bavoil et al. 07]


A buffer

A-Buffer

  • “A list of fragments per-pixel”

    • Anything on the GPU that resembles this?

  • MSAA

    • “A list of samples per-pixel”

    • Samples store coverage


Msaa in review

MSAA in review

  • Multisampled Antialiasing

    • Fragments are rasterized at a higher res

      • 8xMSAA == 8 x aliased resolution

    • Pixel shader is run once per-pixel

    • Frame buffer storage is at sample resolution


Say what

Say What?

  • MSAA samples == A-Buffer pixels??

  • MSAA sample patterns don’t help

  • Need all MSAA samples at pixel center


Line up your sub samples

Line up your Sub-samples

  • Turn off multisampling

    • Still render to an MSAA buffer

    • Pixel shader output bloats to all sub-samples

    • BOOL D3D10_RASTERIZER_DESC::MultisampleEnable

  • Now writing 8 samples per pixel

    • All have the same value!!


Bloating your pixel

Bloating Your Pixel

  • Applause?

  • Meets the definition

    • “List of fragments per-pixel”

  • Not exactly what we want

    • Each item contains same value

    • Next fragment will clobber the entire list

    • Need to update one entry in the list

      • Once and only once


Stencil routing

Stencil Routing

Stencil always increments

Stencil passes when 4


Stencil routing1

Stencil Routing

  • First introduced by Purcell et al 2003

    • Did not work for general rasterization

      • Tile aligned points

    • Fat point is spread across four pixels

      • Four pixels get same value

      • Stencil allows one pixel to update


Stencil routing and msaa

Stencil Routing and MSAA

  • Stencil always operates at sample res

    • Regardless of MultisampleEnable state

    • DX10 Spec

  • Use sub-samples to route

    • Allows any pixel shader output to be routed

      • Arbitrary primitives


Stencil routing and msaa1

Stencil Routing and MSAA


A stencil test that works

A Stencil Test That Works

  • StencilFunc

    • D3D10_COMPARISON_EQUAL

  • StencilRef

    • 2

      • More on this later

  • StencilPassOp and StencilFailOp

    • D3D10_STENCIL_OP_DECR_SAT


Initializing stencil

Initializing Stencil

  • Clear stencil buffer to pass value ( 2 )

    • Initializes sample 0 to 2

  • Use SampleMask to selectively update

    • Stencil set to replace with refrence value


Why start at 2

Why start at 2?

  • When all sub-samples are written

    • Most stencil values will be 0

      • Except the last one written

    • Last sample written stencil == 1

  • When overflow occurs

    • All stencil values will be 0


Occlusion query test

Occlusion Query Test

Pixel did not

overflow

Pixel

overflowed


Handling overflow

Handling Overflow

  • Set sample mask to last sample updated

  • Draw full screen quad

    • Issue an occlusion query

    • Set stencil to pass if stencil == 0

  • Check occlusion query

    • Sample pass count == overflow count


Handling overflow1

Handling Overflow

  • Occlusion query

    • Good

      • Very fast

      • Allows for dynamic A-Buffer sizing

    • Bad

      • Requires some CPU intervention

        • Ideally A-Buffer size is fixed


Stencil routed a buffer

Demo Time!

Demo


Secrets of the dragon

Secrets of the Dragon

  • Single A-Buffer

    • RG32F

      • R is packed color

      • G is depth

    • Saves on texture loads

  • Post process sort

    • 8 fragment per-pixel bitonic sort

      • Additional fragments, insertion sort


8800 gtx performance

8800 GTX Performance

Alpha Blended Stanford Dragon


Limits doh

Limits…DOH!

  • 254 layers of depth max

    • 8-bit stencil ( 255 – 1 for overflow bit )

    • If you do this call us cause that’s crazy

  • Fragments at same depth

    • Must be handled in post-process

  • MSAA


Summary

Summary

  • Stencil Routed A-Buffer

    • Ideally suited for complex geometries

      • Much faster than depth peeling

  • A-buffer can be dynamically resized

    • Use an occlusion query

    • Best to pre-determine size


Future work

Future Work

  • Render target arrays

    • Each target has its own stencil buffer

    • Target replaces sub-sample

      • Or augments sub-sample

    • #arrays * MSAA level in one “CPU pass”

      • With dx10 saturates 254 layers

    • Use instancing for additional “GPU passes”


Thanks for all the fish

Thanks for all the fish

  • Claudio Silva, Steven Callahan, Joao Comba, Aaron Lefohn, Cass Everitt, Peach Myers


The last slide

The last slide…

  • ?

    • [email protected]

    • [email protected]


  • Login