Energy-Precision Tradeoffs in the Graphics Pipeline

Energy-Precision Tradeoffs in the Graphics Pipeline Jeff Pool March 19th, 2012

Motivation Why energy? It matters everywhere: - Mobile devices - Desktop computers - Servers, data centers It’s a bottleneck to performance! http://www.ornl.gov/ornlhome/images/casl/TVA%20Watts%20Bar.jpg http://img717.imageshack.us/img717/3936/1101771coolitomni.jpg

Motivation Why precision? Sign Exponent Mantissa IEEE 754-2008 Single-Precision Floating-Point Representation

Don’t do Unnecessary Work • Max precision isn’t needed: • 8-10 bit color buffers • FP32 => 24 bits of precision • Potentially lots of wasted effort! • It’s certainly more complicated, but worth exploring

My Approach Variable-precision computations - Reduce the precision when possible: 12.5 mantissa bits used - Save energy in arithmetic: 70% less energy - Low errors: 0.086% difference Full-Precision Arithmetic Reduced-Precision Arithmetic

My Approach Communicate fewer bits - Since fewer bits are used in computation - Most DRAM traffic is already compressed • Variable-precision compression: • (on sample frame) • Geometry improved by 12% • Depth improved by 83% Crysis, 2007

The Graphics Pipeline GPU Global Memory Texture Frame- Data Buffer Background

GPUs: A Brief History Programmability Capability Fixed-Function CUDA, Stream, OpenCL GPGPU (NOT to scale!) Time GPU Shader Program Compute Program 1.53, 32.8, …

Thesis Statement Reducing the work done in the modern graphics pipeline through novel communication and variable-precision computation techniques can enable a tradeoff between energy savings and image fidelity, leading to significant energy savings without perceptible loss of image quality.

How? Proving this thesis: • Show that induced errors are imperceptible • Show significant energy savings • Find energy consumed by entire pipeline • Find energy savings possible in each stage

Roadmap • My work • Energy model • Energy savings in computation • Energy savings in communication • Conclusions • Future work

Why an Energy Model? So I’ll know how much difference saving energy in different stages actually makes, know where to focus • Provides researchers/developers a tool to predict energy usage

Strategy • Model construction • Experimentally measure energy for each operation • Energy prediction • Profile a scene for operations performed • Predict total energy consumption (dot product) • Validation • Compare prediction with measured energy

What Operations? • Arithmetic • ADD, MUL, SIN/COS, POW, LOG, … • Memory • Local/Global Load/Store • Programmable • Vertex/Pixel Shaders • Fixed-function • Rasterization, Texture filtering Explicit Implicit

Measuring Energy in the GPU Explicit Implicit OpenGL Enable/Disable operation in question Difference in energy is the operation’s contribution Not as straightforward Ex.: Texture filtering • GPGPU • Runs on same hardware as graphics • No ambiguity in operations • Simple microkernels • Little/no overhead • 10s runtime • Directed tests per operation

Experimental Setup • NVIDIA 8300GS graphics card • Adex Electronics’ PEX16LX PCI riser to interrupt power from motherboard • Supply metered power to the card • 12V • 3.3V • 12V (fan, not counted in energy) • Log runtimes/framerates, measure current as tests run http://www.pretaktovanie.sk/obr/spotreba/eng/PICTURES/P1010283_ENG.jpg

Results

Profiling Operations Performed • Use Microsoft’s PIX to log a frame of a running application: • Framebuffer contents • Vertex data • Render states • Vertex shaders • Pixel shaders • Per draw call (100-1000s per frame) • From all this data, extract operations

Validation • Three different applications, four scenes • Real-world games to test the developed model • Harvested data, predict energy usage • Measured real energy usage, compare Half Life 2: Lost Coast (High/Low Rendering Qualities) Batman: Arkham Asylum Mass Effect

Validation Results Overheads

What Uses the Energy?

Roadmap • My work • Energy model • Energy savings in computation • Energy savings in communication • Conclusions • Future work

Where Does the Power Go? Power CMOS Inverter Ground Ptotal = Pdynamic+ Pstatic

Energy-Saving Techniques Clock gating (Park et al., 2010) Signal gating (Huang and Ercegovac, 2003) Power gating • Coarse (Usami et al., 2009, Sjalander et al., 2005) • Fine (My work) Ptotal = Pdynamic+ Pstatic

Example: 1-Bit Adder !Enable Cin S A Cout B

HW Results SPICE simulations of: Adders: linear savings Multipliers: quadratic savings

Precision in Rendering Variable-Precision fixed-function CPU rendering • Hao and Varshney, 2001 • 3 key differences: GPU, FP32, programmability Depth buffer comparator • Hensley, Singh, and Lastra, 2005 Triangle separation for correct occlusion • Akeley and Su, 2006

So, we have hardware, let’s see what happens in Variable-Precision pixel shaders

A Pixel Shader

Exaggerated Texture Coordinate Errors Original frame (24 mantissa bits) Blocky textures (8 mantissa bits)

Arithmetic Errors Original frame (24 mantissa bits) … Different? (8 mantissa bits)

Exaggerated Arithmetic Errors Original frame (24 mantissa bits) Clearly different (4 mantissa bits)

Different Errors,Different Tolerances • Colors can be pushed far lower • 12, 10, 8 bits for color components (plus one for rounding) • Texture coordinates may need to be fully precise!

So, Treat Them Separately

So, Treat Them Separately Could contribute to texture coordinates A

So, Treat Them Separately Could contribute to texture coordinates A B Will NOT contribute to texture coordinates

Precision Selection Strategies • Statically • Artist-directed • Automatic closed-loop

Static Program Analysis And so on… 9 bits 10 bits 12 bits 11 bits 10 bits 9 bits

Artist-Directed Precisions Precisions are chosen as the effect is designed

Automatic Closed-Loop Precision Selection Run time feedback control Per-shader error detection and precision control Reduced Pixel Error Detection Reduced Pixel Display Renderer Full Pixel (sparsely sampled) Precision Error Controller

Experimental Setup Static analysis • Analyze shaders to find minimum safe operating precision Artist-directed • Modify several demo applications • Allow the artist to choose precisions Automatic closed-loop • Modify the ATTILA GPU simulator • Apply several feedback control schemes • Several test scenes

Data Sets

Results: Precisions Lower is Better!

Results: Closed-Loop Errors Unnoticeable in practice

Results: % Energy Savings Overall Energy: 2/31/5 Higher is Better!

Which Precision Selection Method?

Directed Approach • High savings • 70-80% in arithmetic • 10-20% overall GPU energy • (by arithmetic alone!) • Low errors • Acceptable by design • Quantitatively low (PSNR, % error)

Variable Precision Geometry • Vertex shaders • Similarly high savings (55-80%) • Different types of errors • XY Screen-space • Depth

Energy-Precision Tradeoffs in the Graphics Pipeline

Energy-Precision Tradeoffs in the Graphics Pipeline

Presentation Transcript

Understanding the graphics pipeline

3D Graphics Pipeline

Fueling the Energy Pipeline

Graphics Pipeline: First Pass

The Graphics Pipeline: Projective Transformations

The Graphics Pipeline Revisited

Graphics Pipeline Clipping

Graphics Pipeline Hidden Surface

The Graphics Pipeline

The Graphics Pipeline

The Programmable Graphics Hardware Pipeline

Graphics Pipeline Rasterization

Optimizing the Graphics Pipeline

Fueling the Energy Pipeline

Energy vs Latency tradeoffs in SMAC

Graphics Pipeline Hidden Surfaces

Energy-Delay Tradeoffs in Smartphone Applications

The Graphics Pipeline

The Graphics Pipeline

The Real-time Graphics Pipeline

Graphics Pipeline

Tackling temporal tradeoffs in energy efficiency