1 / 159

Developing Efficient Graphics Software

Developing Efficient Graphics Software. Developing Efficient Graphics Software. Intent of Course Identify application and hardware interaction Quantify and optimize interaction Identify efficient software structure Balance software and hardware system component use.

hanh
Download Presentation

Developing Efficient Graphics Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing Efficient Graphics Software

  2. Developing Efficient Graphics Software • Intent of Course • Identify application and hardware interaction • Quantify and optimize interaction • Identify efficient software structure • Balance software and hardware system component use

  3. Developing Efficient Graphics Software • Outline • 1:35 Hardware and graphics architecture and performance • 2:05 Software and System Performance • Break • 2:55 Software profiling and performance analysis • 3:20 C/C++ language issues • 3:50 Graphics techniques and algorithms • 4:40 Performance Hints

  4. Developing Efficient Graphics Software • Speakers • Applications Consulting Engineers for SGI • optimizing, differentiating, graphics • Keith Cok, Bob Kuehne, Thomas True, Alan Commike

  5. Hardware & Graphics Architecture & Performance Bob Kuehne, SGI

  6. Course Overview • Why is your application drawing so slowly? • Could actually be the graphics • Could be the data traversal • Could be something entirely different

  7. Tour Guide • Platform architecture & components • CPU • Memory • Graphics • Graphics performance • Measurements: triangle rate, fill rate, misc. • Reproduce & maximize

  8. Bottlenecks & Balance • Bottlenecks • Find them • Eliminate them (sort of - move them around) • Balance • Understand hardware architecture • Fully utilize hardware

  9. Yin & Yang • “Yin and yang are the two primal cosmic principles of the universe” • “The best state for everything in the universe is a state of harmony represented by a balance of yin and yang.” • Skeptics Dictionary -- http://skepdic.com/yinyang.html

  10. Write Once Run Everywhere? • My application ran fast on that platform! Why is this one so slow? • Different platforms require different tuning • Different platforms implement hardware differently • Macro: Architecture & features • Micro: Storage capacities, buffers, & caches • Effect: Bandwidth & latency

  11. S t S t S t S t S t S t S t t t t t t : unit of time s: texture setup time t: texture download time Latency & Bandwidth • Definitions: • Latency: time required to communicate a unit of data • Bandwidth: data transferred per unit time • Example: • Latency bottleneck: • Bandwidth bottleneck:

  12. Platform: Software View graphics CPU i/o memory misc net

  13. glue PCI Platform: PCI, AGP CPU Memory CPU Memory glue PCI AGP Disk Net I/O Graphics Disk Net I/O Graphics

  14. Platform: UMA, Switched Hub CPU Memory CPU Memory glue UMA glue PCI Disk Net I/O Graphics Disk Net I/O Graphics

  15. Platform: The Points • Why learn about hardware? • To understand how your app interacts with it • To best utilize the hardware • Potentially can use extra hardware features • Where? • Platform documentation • Talk with hardware vendor

  16. CPU: Overview • CPU Operation • Data transferred from main memory to registers • CPU works on data in registers • Latency • Registers: 0 (free) • Level-1 (L1) cache: 1 • Level-2 (L2) cache: 10x L1 • Main memory: 100x L1 CPU R L1 L2 Main Memory

  17. CPU, Cache, and Memory • Caches designed to exploit data locality • Temporal locality • Spatial locality CPU Main Memory Registers L1 L2

  18. Memory: Cache & Logical Flow In Register? In L1? In L2? Copy to L2 (100) Compute Copy to Register (1) Copy to L1 (10)

  19. Memory: Cache & Physical Flow Main Memory L2 Cache L1 Cache Page Registers CPU

  20. Memory: Allocation & Pools • List elements are often allocated as-needed • This leads to spatial disparity • Mitigated by use of application memory management • Bad: malloc, malloc, malloc, malloc, ... • Good: pools - pool_init, pool_alloc, ... • Graphics example: • Vertices, normals, textures, etc.

  21. Memory: Graphics! Vertex Arrays

  22. xf light clip rast fx fops FIFO Graphics: Pipe xf: world to screen light: apply light clip: clip to view rast: convert to pixels fx: apply texture, etc. fops: test pixel ops

  23. Graphics: Pipe & Akeley Taxonomy • G - Generate geometric data • T - Traverse data structures • X - Transform primitives world to screen • R - Rasterize triangles to pixels • D - Display framebuffer on output device G D X R T

  24. Graphics: Hardware • 4 types of hardware are common • G-TXRD : all hardware • GT-XRD : • GTX-RD : • GTXR-D : all software

  25. Graphics: Performance • Benchmarks • “Trust, but verify.” - an ex-president • Definitions • Triangle rate: speed at which primitives are transformed (X) • Fill rate: speed at which primitives are rasterized (R) • Depth complexity: number of times pixel filled • Caveats • Quantization, fastpath

  26. Graphics: Quantization • Frame quantization is the result of swapbuffers occurring at the next vertical retrace. • Necessary to avoid image artifacts such as tearing • Example: 100Hz display refresh

  27. : one graphics frame tn: 1/100 second Graphics: Quantization no-sync 120 Hz 100 Hz 50 Hz 50 Hz 33 Hz t0 t1 t2 t3 t4 t5 t4 t6 t7

  28. Graphics: Fastpath • Definition • Fastpath: the most optimized path through graphics hardware • Example • fast path: float verts, float norms, AGBR textures, z-test • less fast path: float verts, float norms, RGBA textures, z-test

  29. Graphics: Fastpath Example

  30. Fast path (hardware) Slow path (software) Speed Quality Where is your application? Graphics: Fastpath Points • Fast path is often synonymous with ideal path. • Real usage of graphics falls on a continuum. • Must quantify what hardware can do • Quality & speed

  31. Graphics Hardware: Testing • Duplicate performance numbers simply: • Good: build a simple test program • Better: glPerf - http://www.spec.org • Maximize performance in an app: • Good: Use fast API extensions • Better: Create an “is-fast” test, use what is verified as fast

  32. Graphics Hardware: “Is-Fast” • Test each platform to determine fast path • Once, per-machine, test primitives and modes • Vertex array format, texture format, display list, etc. • Store data in database • Detect hardware changes or time-to-live • Read data from database at startup • Check database or re-generate data

  33. Graphics Hardware: “Is-Fast” • Pseudo-code If ( new_machine() || hardware_changed() ) { test_interesting_modes(); store_in_database(); } else { // have database entry get_performance_data_from_database(); } // use the modes & primitives that are ‘’fast’’ when rendering

  34. Think Globally, Act Locally • Think globally • Know the platforms & graphics hardware • Use hardware effectively in your app • Balance hardware utilization • Act locally • Use in-cache data • Understand hardware & graphics fastpaths • Balance quality vs. performance

  35. Software and System Performance Thomas J. True, SGI

  36. Quantify System Evaluation Graphics Analysis Bottleneck Elimination A Four Step Process

  37. Quantify • Characterize • Application Space • Primitive Types • Primitive Counts • Rendering Characteristics • Frame Rate

  38. Quantify • Compare

  39. Examine System Configuration • Resources • Memory • Disk • Setup • Display • Network

  40. Graphics Analysis • Ideal Performance • Keep graphics pipeline full. • 100% CPU utilization running application code. • 100% graphics utilization.

  41. 50 40 60 30 70 20 80 10 90 0 100 Acme Electronics 50 40 60 30 70 20 80 10 90 0 100 Graphics Analysis • Graphics Bound

  42. Graphics Analysis • Graphics Bound • Graphics subsystem processes data slower than CPU can feed it. • Graphics subsystem issues an interrupt which causes the CPU to stall. • Data processing within application stops until graphics subsystem can again accept data.

  43. Graphics Analysis • Geometry Limited • Limited by the rate at which vertices can be transformed and clipped. • Fill Limited • Limited by the rate at which transformed vertices can be rasterized.

  44. 50 40 60 30 70 20 80 10 90 0 100 Acme Electronics 50 40 60 30 70 20 80 10 90 0 100 Graphics Analysis • CPU Bound

  45. Graphics Analysis • CPU Bound • CPU at 100% utilization but can’t feed graphics fast enough. • Graphics subsystem at less than 100% utilization. • All CPU cycles consumed by data processing.

  46. Graphics Analysis • Determination Techniques • Remove graphics API calls. • Shrink graphics window. • Reduce geometry processing requirements. • Use system monitoring tool.

  47. Graphics Performance Problem Use system monitoring tool Shrink graphics window Reduce geometry load Removerendering calls Graphics bound: fill limited Fallen off fast path Graphics bound: geometry limited Graphics bound:? Graphics Analysis Start Performance Problem Not Graphics Removegraphics API calls Excessive or unexpected CPU activity = frame rate increase = no change in frame rate

  48. Acme Electronics Graphics Analysis • Graphics Architecture: GTXR-D

  49. Graphics Analysis • Graphics Architecture: GTXR-D • (aka Dumb Frame Buffer) • CPU does everything. • Typically CPU bound. • To remedy, buy a “real” graphics board.

  50. Acme Electronics Graphics Analysis • Graphics Architecture: GTX-RD

More Related