1 / 77

Energy-Precision Tradeoffs in the Graphics Pipeline

Energy-Precision Tradeoffs in the Graphics Pipeline. Jeff Pool March 19 th , 2012. Motivation. Why energy? It matters everywhere: - Mobile devices - Desktop computers - Servers, data centers It’s a bottleneck to performance!.

caelan
Download Presentation

Energy-Precision Tradeoffs in the Graphics Pipeline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Energy-Precision Tradeoffs in the Graphics Pipeline Jeff Pool March 19th, 2012

  2. Motivation Why energy? It matters everywhere: - Mobile devices - Desktop computers - Servers, data centers It’s a bottleneck to performance! http://www.ornl.gov/ornlhome/images/casl/TVA%20Watts%20Bar.jpg http://img717.imageshack.us/img717/3936/1101771coolitomni.jpg

  3. Motivation Why precision? Sign Exponent Mantissa IEEE 754-2008 Single-Precision Floating-Point Representation

  4. Don’t do Unnecessary Work • Max precision isn’t needed: • 8-10 bit color buffers • FP32 => 24 bits of precision • Potentially lots of wasted effort! • It’s certainly more complicated, but worth exploring

  5. My Approach Variable-precision computations - Reduce the precision when possible: 12.5 mantissa bits used - Save energy in arithmetic: 70% less energy - Low errors: 0.086% difference Full-Precision Arithmetic Reduced-Precision Arithmetic

  6. My Approach Communicate fewer bits - Since fewer bits are used in computation - Most DRAM traffic is already compressed • Variable-precision compression: • (on sample frame) • Geometry improved by 12% • Depth improved by 83% Crysis, 2007

  7. The Graphics Pipeline GPU Global Memory Texture Frame- Data Buffer Background

  8. GPUs: A Brief History Programmability Capability Fixed-Function CUDA, Stream, OpenCL GPGPU (NOT to scale!) Time GPU Shader Program Compute Program 1.53, 32.8, …

  9. Thesis Statement Reducing the work done in the modern graphics pipeline through novel communication and variable-precision computation techniques can enable a tradeoff between energy savings and image fidelity, leading to significant energy savings without perceptible loss of image quality.

  10. How? Proving this thesis: • Show that induced errors are imperceptible • Show significant energy savings • Find energy consumed by entire pipeline • Find energy savings possible in each stage

  11. Roadmap • My work • Energy model • Energy savings in computation • Energy savings in communication • Conclusions • Future work

  12. Roadmap • My work • Energy model • Energy savings in computation • Energy savings in communication • Conclusions • Future work

  13. Why an Energy Model? So I’ll know how much difference saving energy in different stages actually makes, know where to focus • Provides researchers/developers a tool to predict energy usage

  14. Strategy • Model construction • Experimentally measure energy for each operation • Energy prediction • Profile a scene for operations performed • Predict total energy consumption (dot product) • Validation • Compare prediction with measured energy

  15. What Operations? • Arithmetic • ADD, MUL, SIN/COS, POW, LOG, … • Memory • Local/Global Load/Store • Programmable • Vertex/Pixel Shaders • Fixed-function • Rasterization, Texture filtering Explicit Implicit

  16. Measuring Energy in the GPU Explicit Implicit OpenGL Enable/Disable operation in question Difference in energy is the operation’s contribution Not as straightforward Ex.: Texture filtering • GPGPU • Runs on same hardware as graphics • No ambiguity in operations • Simple microkernels • Little/no overhead • 10s runtime • Directed tests per operation

  17. Experimental Setup • NVIDIA 8300GS graphics card • Adex Electronics’ PEX16LX PCI riser to interrupt power from motherboard • Supply metered power to the card • 12V • 3.3V • 12V (fan, not counted in energy) • Log runtimes/framerates, measure current as tests run http://www.pretaktovanie.sk/obr/spotreba/eng/PICTURES/P1010283_ENG.jpg

  18. Results

  19. Profiling Operations Performed • Use Microsoft’s PIX to log a frame of a running application: • Framebuffer contents • Vertex data • Render states • Vertex shaders • Pixel shaders • Per draw call (100-1000s per frame) • From all this data, extract operations

  20. Validation • Three different applications, four scenes • Real-world games to test the developed model • Harvested data, predict energy usage • Measured real energy usage, compare Half Life 2: Lost Coast (High/Low Rendering Qualities) Batman: Arkham Asylum Mass Effect

  21. Validation Results Overheads

  22. What Uses the Energy?

  23. Roadmap • My work • Energy model • Energy savings in computation • Energy savings in communication • Conclusions • Future work

  24. Where Does the Power Go? Power CMOS Inverter Ground Ptotal = Pdynamic+ Pstatic

  25. Energy-Saving Techniques Clock gating (Park et al., 2010) Signal gating (Huang and Ercegovac, 2003) Power gating • Coarse (Usami et al., 2009, Sjalander et al., 2005) • Fine (My work) Ptotal = Pdynamic+ Pstatic

  26. Example: 1-Bit Adder !Enable Cin S A Cout B

  27. HW Results SPICE simulations of: Adders: linear savings Multipliers: quadratic savings

  28. Precision in Rendering Variable-Precision fixed-function CPU rendering • Hao and Varshney, 2001 • 3 key differences: GPU, FP32, programmability Depth buffer comparator • Hensley, Singh, and Lastra, 2005 Triangle separation for correct occlusion • Akeley and Su, 2006

  29. So, we have hardware, let’s see what happens in Variable-Precision pixel shaders

  30. A Pixel Shader

  31. Exaggerated Texture Coordinate Errors Original frame (24 mantissa bits) Blocky textures (8 mantissa bits)

  32. Arithmetic Errors Original frame (24 mantissa bits) … Different? (8 mantissa bits)

  33. Exaggerated Arithmetic Errors Original frame (24 mantissa bits) Clearly different (4 mantissa bits)

  34. Different Errors,Different Tolerances • Colors can be pushed far lower • 12, 10, 8 bits for color components (plus one for rounding) • Texture coordinates may need to be fully precise!

  35. So, Treat Them Separately

  36. So, Treat Them Separately Could contribute to texture coordinates A

  37. So, Treat Them Separately Could contribute to texture coordinates A B Will NOT contribute to texture coordinates

  38. Precision Selection Strategies • Statically • Artist-directed • Automatic closed-loop

  39. Static Program Analysis And so on… 9 bits 10 bits 12 bits 11 bits 10 bits 9 bits

  40. Artist-Directed Precisions Precisions are chosen as the effect is designed

  41. Automatic Closed-Loop Precision Selection Run time feedback control Per-shader error detection and precision control Reduced Pixel Error Detection Reduced Pixel Display Renderer Full Pixel (sparsely sampled) Precision Error Controller

  42. Experimental Setup Static analysis • Analyze shaders to find minimum safe operating precision Artist-directed • Modify several demo applications • Allow the artist to choose precisions Automatic closed-loop • Modify the ATTILA GPU simulator • Apply several feedback control schemes • Several test scenes

  43. Data Sets

  44. Data Sets

  45. Results: Precisions Lower is Better!

  46. Results: Closed-Loop Errors Unnoticeable in practice

  47. Results: % Energy Savings Overall Energy: 2/31/5 Higher is Better!

  48. Which Precision Selection Method?

  49. Directed Approach • High savings • 70-80% in arithmetic • 10-20% overall GPU energy • (by arithmetic alone!) • Low errors • Acceptable by design • Quantitatively low (PSNR, % error)

  50. Variable Precision Geometry • Vertex shaders • Similarly high savings (55-80%) • Different types of errors • XY Screen-space • Depth

More Related