allegorithmic substance
Download
Skip this Video
Download Presentation
Allegorithmic Substance

Loading in 2 Seconds...

play fullscreen
1 / 38

Allegorithmic Substance - PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on

Allegorithmic Substance. Threaded Middleware. Procedural textures on multi-core . Other than framerate and features, what else can you do with extra CPU power ? We’ll look at Allegorithmic’s middleware, Substance. Procedural textures are valuable for modern games. Have a LOT of textures.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Allegorithmic Substance' - sugar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
allegorithmic substance

AllegorithmicSubstance

Threaded Middleware

procedural textures on multi core
Procedural textures on multi-core
  • Other than framerate and features, what else can you do with extra CPU power ?
  • We’ll look at Allegorithmic’s middleware, Substance
procedural textures are valuable for modern games
Procedural textures are valuable for modern games
  • Have a LOT of textures.
  • Want shorter loading times‏‏ (faster starts, teleportations or zooms)‏.
  • Need to reduce texture memory on a disc, for download, and/or in RAM.
  • Can benefit from more flexible and reusable assets.
introducing substance
Introducing Substance
  • In Q2 2007 Allegorithmic started a complete reengineering of ProFX2, authoring tool and engine, named Substance.
  • Unit tests were done very early to ensure that Substance could target streaming.
  • Cross-platform : PC, PS3, XBOX, etc.
  • Expected linear multi-thread scalability.
what is substance
What is Substance ?
  • Substance is a middleware product composed of two elements.
  • Substance Authoring Tool lets you
    • create procedural textures
    • create texture packages of a few kilobytes !
    • A cooker compiles generic data into binaries optimized for a specific platform or user.
  • Substance Engine
    • generates bitmap textures on the fly.
less fps
Less FPS ?
  • More textures, not less FPS
    • Substance consumes idle cycles, not frames
  • Graphics bitrates follow Moore\'s law
    • Higher poly count → bigger worlds
    • Higher filter rate → larger textures
    • Desired texture volume grows faster than RAM
  • Streaming is a necessity
    • But HDD net bitrate does not follow. Bottleneck !
  • Modern gameplay entails sudden bitrate bursts
    • This is worsened by HDD seeks and entails stalls.
no a stable and high fps
No, a stable and high FPS.
  • Even masked, a stall is actually a FPS drop
  • Substance works in Random Access Memory
  • The gamer zooms or teleports:
    • Give 4 cores and a GPU to Substance
    • Sacrifice 1 or 2 frames
    • Substance gen. & cache 1-2M new texels.
    • The stall does not hinder game play.
  • Substance diminishes stalls
  • Substance helps to maintain a high FPS.
performance issue streaming in games
Performance issue:streaming in games
  • DVD or HDD net bitrate is 2 or 6 MB/s
  • Our aim: add a stable 4MB/s without the GPU
  • Requires billions of intermediate pixels/s.
  • Can CPUs compete with GPUs ?
  • Opportunity: cores are still under-exploited in most game engines.
  • Texture processing is privileged in the new multi-core architectures.
the architecture was designed with these issues in mind
The architecture was designed with these issues in mind:
  • Homogeneous CPU and GPU versions
  • Streaming (~1-10 CPU cycles per pixel)‏
  • SIMD & MT for the multi-core generations
  • No cache nor threading pollution
  • Fine grained jobs and lockless sync.
  • Low memory footprint
the theoretical benefit was calculated
The theoretical benefit was calculated
  • New architectures come with enhanced SIMD. Expected x10 compared to std C++
  • Tricks and algorithmic changes could give another x10 on some filters, like DXT
  • We were confident that our image processes could be well threaded. Partly because we generate textures asynchronously
  • Hence the CPU version of ProFX2 could be accelerated by a factor x25-x100
this is the approach taken to address the issue
This is the approach taken to address the issue:
  • Simple innerloop tests actually showed that optimized SSE2-4 code could give a boost of x10
  • Find a data layout coherent with micro parallelism (SIMD and pipeline), low level threading, cache and memory handling.
  • OpenMP is then used to test strategies before designing a specific MT HAL
here s the code that was developed to make this possible
Here’s the code that was developed to make this possible:
  • A SIMD HAL is ready for PC, Xbox, PS3.
  • OpenMP easily gives a 85% MT linearity.
  • Our MT HAL is converging towards a model of lockless synchronization, 95% expected.
  • The cooker precomputes data that will help synchronization and MT efficiency.
  • Our API exposes asynchronous commands. Perfect to share cores with a game loop !
the compositing graph node based image processing
The compositing graph,node based image processing
  • Authoring Tool: non linear editing
  • Engine: efficient high level structure
  • Graph (DAG) contains 3 types of nodes:
    • Sources: procedural noise, bitmaps, SVGs
    • Filters: blend, HSL, TRS, warp, blur, etc.
    • Outputs: coherent diffuse & normal maps, etc.
  • Main advantages:
    • Libraries, capsules: instanciation of subgraphs
    • Complex variants: fast to create and compute
    • Dynamic custom branches (ex: aging textures)‏
threading strategies
Threading strategies
  • High level threading:
    • Task decomposition : 1 node (filter) per thread
    • Graph splitting ensures task independency
  • Low level threading:
    • Data decomposition : 1 strip of blocks per thread
    • Dispatcher ensures non conflicting areas
    • Pixel to pixel filters are concatenated.
    • Streamed R/W, no L2 cache pollution
    • Temporary blocks in private L1 double buffers
    • Intermediate images never allocated
    • Lockless reactive sync and cache friendly
expect more streaming bandwidth
Expect more streaming bandwidth
  • Substance generates 4MB/s of compressed textures per second
  • Cumulate this with classical streaming
  • 50+ MB/s loading with 4 cores and 1 GPU
here s how close we got to the theoretical best performance
Here’s how close we got to the theoretical best performance:
  • DXT compression at 2G pixels/s (same as what hi-end GPUs can do in 2007).
  • 8 bits SVG (cooked) rendering at 20G/s. 8G/s anti-aliasing with 4 sub-samples.
  • In most cases 4 cores give a x3.8 boost
  • Some filters are more problematic, but solutions have been imagined in details, and will be implemented between Q2 and Q4 2008.
here s the new performance profile
Here’s the new performance profile:
  • Substance and ProFX2 figures are for one core.
  • 4 cores: 3.8 times more fillrate.
  • ProFX2: SVG GPU
  • Substance: SVG CPU
  • SVG AA: 2G pixels/s per core
this is future proofed
This is future-proofed
  • The cooker precomputes whatever helps to linearise computations.
  • Scalable code: SSE4 added in one day thanks to the SIMD HAL
  • Scalable threading: our two strategies scale
  • A few functions dispatch virtual CPU "shaders"
  • 64-cores ready ↔ code a new dispatcher ?
  • Multiplatform design.
future sources of bandwidth
Future sources of bandwidth
  • SIMD code can be better pipelined in ASM.
  • Our cooker can optimize a lot of things.
  • Authoring tool will have a RT profiler
  • Artists gaining experience with Substance will also optimize their packages better.
  • Artist feedback will also help us to improve the expressiveness of each filter
  • ~30-50 filters per texture, main perf. divisor.
here s how you can best take advantage of procedural textures
Here’s how you can best take advantage of procedural textures
  • Anticipate texture generation requests.
  • Predict visibility (HOM, PVS)‏.
  • Create mipmaps. Access levels JIT.
  • Cache the useful texels.
  • Adapt texture resolution to workload.
  • Use texture variants, less tiling textures or details. Show a higher texel/pixel ratio.
what do you think
What do you think?
  • Have you tried something like this?
  • Have you rejected trying something like this?
ad