1 / 28

Scalability

Scalability. Advanced D3D Programming Richard Huddy RichardH@nvidia.com. Basic Objectives. To produce the best experience on every users machine To exploit all of the resources available To cope with a broad spread of hardware

danica
Download Presentation

Scalability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com Scalability - R Huddy

  2. Basic Objectives • To produce the best experience on every users machine • To exploit all of the resources available • To cope with a broad spread of hardware • To avoid ‘bottoming out’ during the shelf-life of the game / engine Scalability - R Huddy

  3. What is a high-end PC? A 125+ mega-texel device A 125+ mega-pixel device A fast CPU ( >= 350MHz) AGP 2X/4X Bus Lots of system RAM ( >= 64MB) Huge frame buffers (16 to 32 MB) Multi-Texture at low cost Scalability - R Huddy

  4. Power Trends CPU Speed Fill Rate ? Appreciate the absolute values and the ratios. Scalability - R Huddy

  5. So what’s the problem? BeginScene() Second generation hardware: time A B C EndScene() CPU b c a Graphics time Third generation hardware: Wow, 10% faster! A B C EndScene() CPU b c a Graphics Scalability - R Huddy

  6. What can you do to help? Scalability is the key: • Run at higher screen resolutions • Run at higher color depths • Use more complex rendering techniques on good hardware • Ship multiple geometry models • Protect your CPU • Unlock the frame rate Scalability - R Huddy

  7. Higher Screen Resolutions 1) Include direct support for higher resolution modes (uses lots of disk space). 2) Store high resolution art and filter down to produce lower resolution art. 3) Store low resolution art and pixel double: If you have art at 512x384 use it for 1024x768 If you have art at 640x480 use it on 1280x1024 (but only use a 1280x960 viewport) Scalability - R Huddy

  8. Higher Color Depths • Runs at much the same speed but gives the user a much richer experience • Uses frame buffer memory constructively • You can re-use the previous 16 bit assets • The main performance loss in true color is often due to texture management But beware the Frame Buffer + Z Buffer depth constraint on Riva TNT Scalability - R Huddy

  9. Complex Rendering Techniques - I • Environment Mapping • Beware of spending too much CPU on this. • Dual Texture Lighting • Bump Mapping • Use more alpha transparency • But see also “Alpha sort issues” later on… Please try to use the extra fill rate! Scalability - R Huddy

  10. Complex Rendering Techniques - II • Trilinear mipmapping for almost everything • Use Detail textures • Large textures for extra realism • 32 bit textures - where it’s a quality win • Compressed textures as long as quality is not compromised Scalability - R Huddy

  11. Protect your CPU The big ones: • __ftol and other ‘type conversion’ nightmares • sqrt() • that’ll be seventy cycles please... • Reciprocal square root • One hundred and nine cycles through the FPU… • Transform and lighting (more on that later) Scalability - R Huddy

  12. Removing __ftol • Remember that the compiler doesn’t have a choice but you can check the output • Write you own inline assembler conversion routine if… • You can accept differing rounding rules This doesn’t break the optimiser! Scalability - R Huddy

  13. Replacement for sqrt() • Sqrt seems ‘natural’ if you are normalising vectors, calculating environment map coordinates or calculating distances - but it’s sloooow • Sample code is available from the developer web site or from me directly and will be in future versions of the SDK. Scalability - R Huddy

  14. Saturation Arithmetic (C) Limiting a floating point number to lie in the range 0.0 to 1.0 inclusive (traditional method): if (f < 0.0) f = 0.0; else if (f > 1.0) f = 1.0; Scalability - R Huddy

  15. Saturation Arithmetic (Pentium) if (*(long *)&f < 0) *(long *)&f = 0; else if (*(long *)&f > 0x3f800000) *(long *)&f = 0x3f800000; • This is faster on a Pentium class processor since the FPU is “non-optimal” (i.e. slow) and the integer unit is much faster. Scalability - R Huddy

  16. Saturation Arithmetic (Pentium II) • Use the “cmov” instructions: cmp [f],0 cmovb [f],0 cmp [f],3f800000 cmova [f],3f800000 Faster since unpredictable branches are the bottleneck here. Unavailable on a Pentium. Scalability - R Huddy

  17. Unlock the Frame Rate • It’s essential that your physics model can run at high refresh rates. • At least 100fps • 30 or 60 fps limits are not acceptable and lead to flat performance on high end hardware Scalability - R Huddy

  18. The Value of Batching Case Specifics: • The average # of ‘Polys Per Call’ (PPC) to DrawPrimitive was 2.6, producing 40fps • Removing state changes to raise the average PPC to ~50 produced 58fps • Most of the removed state changes were “reasonable”, i.e. not logically redundant • The changes did not reduce visual quality at all • PPC of 200 is optimal Scalability - R Huddy

  19. Alpha Sort Issues The “standard” solution is… 1) Draw all non-alpha polys (sort by texture) 2) Draw all alpha polys in back to front order with Z compare enabled and Z update disabled. This copes with overlapping alpha polys but you can’t sort by texture. (Intersection requires decimation). Scalability - R Huddy

  20. Alpha Sort with Bounding Boxes When you are ready to draw your alpha polys then draw non-overlapping sets using the sort-by-texture technique as before A Here, you can safely draw all of A before any of B or C… B&C need sorting B Viewport C Scalability - R Huddy

  21. Geometry - Part 1 • Use the DX6 Transform and Clip engine - it’ll be nearly as fast as your best efforts • It takes advantage of CPU specific optimisations done by Intel, AMD etc. • It uses the guard band clipping region to enhance performance • Use the DX7 interface ASAP Scalability - R Huddy

  22. Geometry - Part 2 • This gets you ready for hardware which can do the job much faster than the CPU • Tell the chip designers if you need anything non-standard • If you think DX is too slow then use a run-time benchmark to select between DX and your own code Scalability - R Huddy

  23. Geometry - Part 3 • Use the DX pipeline for geometry which may be rendered • Use your own transform for bounding boxes, collisions, portals etc • Treat hardware T&L as • Write only • Not necessarily pixel identical to CPU T&L DIPVB() Scalability - R Huddy

  24. Geometry - Part 4 • Consider choosing between models at game start-up time • More complex Geometry should be several times more complex • Introduce some LOD management • Your artists are probably generating more complex models and then throwing them away Scalability - R Huddy

  25. Lighting - Part 1 • If the DX Lighting model is good enough then there are people who want to help you • Multi-texture shadow maps and light maps can be very fast now • remember that (multi-pass != multi-texture) • Tell the chip companies what you need Scalability - R Huddy

  26. Lighting - Part 2 • Support more lights • User a richer set of light types • Scale with available power • If you have more complex geometry you get better lighting quality Scalability - R Huddy

  27. Summary • Use the D3D pipeline as much as possible • ‘Use’ the CPU carefully- ‘Abuse’ the fill rate • Get on board with DX7 • Offer the richest experience possible • You may have to treat the PC as two distinct platforms, ‘High-end’ and ‘Low-end’ Scalability - R Huddy

  28. Questions ? ? ? ? ? ? ? Richard Huddy RichardH@nvidia.com www.nvidia.com Scalability - R Huddy

More Related