1 / 27

Workload Characterization of 3D Games

Workload Characterization of 3D Games. Jordi Roca, Victor Moya, Carlos González, Chema Solis, Agustín Fernandez and Roger Espasa (Intel DEG Barcelona). Computer Architecture Department. Outline. Introduction Game selection & stats gathering Game analysis System → GPU traffic

maegan
Download Presentation

Workload Characterization of 3D Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workload Characterization of 3D Games Jordi Roca, Victor Moya, Carlos González, Chema Solis, Agustín Fernandez and Roger Espasa (Intel DEG Barcelona) Computer Architecture Department

  2. Outline • Introduction • Game selection & stats gathering • Game analysis • System →GPU traffic • Primitive culling efficiency • Rasterization pipeline • Fragment shading & texturing • Memory usage • Conclusions

  3. Introduction • Games and GPU evolve fast • GPUs cater for game demands: • Better effects (flexible programming models) • Higher fill-rate (more processing power) • Higher quality (HDR, MSAA, AF) • Games highly tuned to released GPUs • New characterization needed for every Game and GPU generation.

  4. Outline • Introduction • Game selection & stats gathering • Game analysis • System →GPU traffic • Primitive culling efficiency • Rasterization pipeline • Fragment shading & texturing • Memory usage • Conclusions

  5. Game workload selection • Resolution: 1024x768

  6. Collect Verify Simulate Analyze Trace GLInterceptor OpenGL API call stats OpenGL API call stats GLPlayer GLPlayer Vendor OGL Driver ATTILA OGL Driver Vendor OGL Driver Vendor OGL Driver Vendor OGL Driver μ-arch stats ATI R520/NVidia G70 ATI R520/NVidia G70 ATI R520/NVidia G70 ATTILA Simulator ATI R520/NVidia G70 Signal Traffic Framebuffer Framebuffer Framebuffer Framebuffer Framebuffer CHECK! CHECK! Signal Visualizer Statistics environment (OpenGL) OGL Application OGL Application GLInterceptor

  7. Statistics environment (Direct3D) Collect Verify Simulate Analyze D3D Application PIXRun Trace Microsoft PIX Direct3D API call stats DXPlayer Microsoft D3D Driver Microsoft D3D Driver ATI R520/NVidia G70 ATI R520/NVidia G70 Framebuffer Framebuffer CHECK!

  8. Outline • Introduction • Game selection & stats gathering • Game analysis • System →GPU traffic • Primitive culling efficiency • Rasterization pipeline • Fragment shading & texturing • Memory usage • Conclusions

  9. System →GPU traffic * T. Mitra. T. Chiueh, “Dynamic 3D Graphics Workload Characterization and the architectural implications”, MICRO ‘99

  10. System →GPU traffic Index BW

  11. Post-T&L vertex cache Primitive Assembly v2 Index Buffer v1 Vertex data Fetcher Vertex shader (T&L) v4 v3 Memory System →GPU traffic Post-T&L vertex cache • For adjacent triangles lists: • 2/3 of referenced vertexes already computed : 66% hit rate

  12. Post-T&L vertex cache experiments System →GPU traffic • Results show expected hit rate • Game preference for triangle lists: • Low Bus BW usage related to index sent • Same vertex computation work as with strips or fans using a Post-T&L vertex cache • Triangle lists are easier managed by modeling tools.

  13. Outline • Introduction • Game selection & stats gathering • Game analysis • System →GPU traffic • Primitive culling efficiency • Rasterization pipeline • Fragment shading & texturing • Memory usage • Conclusions

  14. Assembled triangles Traversed triangles Primitive culling efficiency • Clipping/Culling intensively used by our games. • Quake4: half of the polygons lie out of the view volume. • Game renderer engines let GPU do the important clipping/culling work: • Easier and cheaper in GPU Hardware.

  15. Outline • Introduction • Game selection & stats gathering • Game analysis • System →GPU traffic • Primitive culling efficiency • Rasterization pipeline • Fragment shading & texturing • Memory usage • Conclusions

  16. Boundaries generate non-full quads Rasterization pipeline The Basics • Triangles are broken into quads (2x2 fragments) • Quad frags are tested individually in different stages: • Z test (hidden surfaces),Stencil test, Alpha Test (transparency), Color Mask. • Finally alive frags update framebuffer • Empty quads are not further processed

  17. Rasterization pipeline Experimentation • Quad generation efficiency: • Higher efficiency than reported in [Mitra 99] • Results show between 40 and 60% efficiencies. • Interactive 3D games use less detailed 3D models (larger triangles).

  18. Rasterization pipeline • Doom3 and Quake4 • Polygon rasterization overhead due to stencil shadow volumes (SSV)

  19. Rasterization pipeline • Fragment rejection breakdown: • On-die HZ greatly reduces GDDR BW avoiding Z&Stencil buffer accesses. • In SSV games: Still room for higher BW reduction with HZ performing also Stencil test

  20. Outline • Introduction • Game selection & stats gathering • Game analysis • System →GPU traffic • Primitive culling efficiency • Rasterization pipeline • Fragment shading & texturing • Memory usage • Conclusions

  21. Fragment shading & texturing • Texture filtering cost measured in bilinears: • Texture pipelines can usually execute 1 bilinear/cycle Bilinear filtering: 1 bilinear (constant) Trilinear filtering: 2 bilinears (constant) Anisotropic filtering: from 2 up to 32 bilinears (variable)

  22. Fragment shading & texturing • ALU to Texture Ratio • ATI Xenos, RV530, R580 peak performance: • Up to 3 ALU instructions per bilinear • 80% ALU power not used

  23. Outline • Introduction • Game selection & stats gathering • Game analysis • System →GPU traffic • Primitive culling efficiency • Rasterization pipeline • Fragment shading & texturing • Memory usage • Conclusions

  24. Memory usage • Memory Hierarchy: • Specialized features: • Fast clears • Transparent compression • Hit rate and miss BW: • In non-SSV games (UT2004): • Most demanding stages: Texture, Color. • In SSV games (Doom3, Quake4) • The most demanding stage: Z&Stencil (50%!!)

  25. Conclusions

  26. Conclusions • Do our 3D games use GPU resources efficiently?

  27. Conclusions • Some inferred implications

More Related