1 / 42

Sony Computer Entertainment Development Conference 2nd - 3rd August 2001

Sony Computer Entertainment Development Conference 2nd - 3rd August 2001. GS Master class. Mark Breugelmans. What we know about the GS. GS memory is 4meg GS fill rate is 1.2gigapixel/sec (textured) GS input bandwidth is 64bit We can stream up to 1.2gigabyte a second

leala
Download Presentation

Sony Computer Entertainment Development Conference 2nd - 3rd August 2001

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sony Computer EntertainmentDevelopment Conference2nd - 3rd August 2001

  2. GS Master class Mark Breugelmans

  3. What we know about the GS • GS memory is 4meg • GS fill rate is 1.2gigapixel/sec (textured) • GS input bandwidth is 64bit • We can stream up to 1.2gigabyte a second • GS polygon though-put is determined by: • Set-up time (number of cycles per vertex) • Polygon size (number of pixels to draw)

  4. Getting data in • GS runs at 150mhz but with only a 64bit input • That’s around 24megabyte/frame (PAL) to be shared between textures and geometry • Geometry • Use strips for fastest geometry set-up • Textures • Always pack 4,8,16bit textures into 32bit format before hand for fastest transfer.

  5. Texture Transfer Rates • Theoretical rate is 1.2gig/sec • Transfer rates • 32, 24, 16bit 1200 Megabyte/sec (1065*) • 8bit 900 Megabyte/sec (799*) • 4bit 600 Megabyte/sec (383*) • (* path3 measured values) • Sample code shows you how to convert

  6. Small triangles and set-up time • At most 8 textured pixels are drawn per cycle • Up to 8x4 that can be drawn in set-up time • The GS is not very efficient for tiny triangles

  7. Small triangles and Fill-rate • Pixels are drawn by the GS in groups of 8 • Small triangles will not make use of this • Triangle Size Pixels Drawn/Cycle • 1x1 0.12 • 2x2 0.5 • 4x4 2 • 8x8 5.27 • 16x16 6.13

  8. Fill rate factors • Triangle size • Texture to pixel size • Texture filtering modes (Tri-linear, mip-maps) • Fog • Caches • Texture page buffer • Frame/Z page buffer

  9. Frame/Z Buffer Page caches • Frame and Z-Buffer: 8k • split into 2 buffers: 32x32x32bit = 4k each • Page refill is very fast • 8192bits per cycle (150gigabyte/sec bandwidth!) • Whole 8k page buffer refilled in 8 cycles Frame 32x32 Z Buffer 32x32

  10. Frame/Z Page Cache misses • Frame/Z Page cache will get filled line by line as drawing scans down • Fill rate while varying height is roughly constant • Fill rate while varying width varies with page miss • Cache misses due for Frame/Z page don’t drop fill-rate much below 1gigapixel. • Textures are usually more of a problem

  11. Fill-rate vs. Triangle size

  12. Level of detail • As polygon counts head into millions pixel sizes shrink rapidly • PA scans of games suggests better use of LOD would benefit some games significantly. • The back of a 5000 polygon car may result in just 50 visible pixels once projected onto the screen. • Similarly there’s no point having detailed textures that are going to be shrunk so much

  13. A pixel density test • Set all vertices to: • red=0, green=1, blue=0 • alpha blend=destination + source • z test = disabled • texture = disabled • Lighter areas show you where there is high density or overdraw

  14. Texture Page caches 4bit 128x128 • Texture cache: 8k 32bit 64x32 Also used for 24, 8H, 4HL, 4HH 16bit 64x64 8bit 128x64

  15. Texture Cache misses example • 64x32 sprite, 24bit texture • Texture size Fill-rate GS cycles • 64x32 1158 262 • 65x32 596 514 • One pixel outside the page halves fill rate! • Texture cache miss is based on the texture co-ordinates not the original texture size • Crossing texture pages also affects the cache

  16. Crossing Texture Pagesefficiently • The blocks in the pages are zig-zagged in 1/4s, 1/16s etc for efficiency. • Use at most 1/2 page width and height to avoid crossing 3 quarters which causes many block reloads / page misses Crosses 2 quarters Crosses 3 quarters

  17. Recommended subdivision • PA scans showing GS wait for texture • Suggested subdivision for each texture mode: • Texture mode Subdivision • 4bit (128x128) 64x64 • 8bit (128x64) 64x32 • 16bit (64x64) 32x32 • 24/32bit (64x32) 32x16 Not subdivided 256x256(4bit) Subdivided 256x256(4bit)

  18. Reducing texture cache miss • Use 4bit or 8bit textures • Clamp texture to page size to keep in page • Bilinear may fetch 1pixel outside your co-ordinate range. • Either/Or • Keep all textures within one page • Sub-divide polygons until ST co-ordinates of each polygon stay within a half cache page

  19. Texture reduction penalty

  20. Mip-maps • Good for avoiding texture reduction • Look better • May help reduce texture transfers for distant drawing • Watch out for performance on large polygons • mip-maps in different pages can cause multiple texture cache reloads

  21. Mip-maps on large primitives • Primitive is drawn line by line • Wall reloads all mipmaps for every line • Road loads each mip-map only once 4 3 2 1 1 2 3 4

  22. Tri-linear performance • Tri-linear fill rate is 1/2 the speed of bilinear. • It’s fetching twice the number of pixels • When two mip-map levels are in different pages Tri-linear is 8x slower than bi-linear • Due to multiple page loads per pixel • Solutions • Keep smaller mip-maps in same page • Disable tri-linear for near mipmap levels • Perhaps do tri-linear as 2 pass with alpha

  23. Fill-rate and Fog

  24. Alternative FOG • For larger textured primitives it is quicker to do fog as a second pass • Technique • 1st pass draw a textured primitive • 2nd pass gouraud and alpha blended primitive

  25. Scissoring • Early Pixel reject • Pixels discarded in lines • Eliminates all page misses and texture loads • Speed depends on location of triangle 7 7 6 25 26 18 52 52 34 9 12 6 36 280 18 79 1135 34 4 4 2 12 12 2 25 25 2 16x16 triangle 64x64 triangle 128x128 triangle Note: All Timings in GS cycles

  26. Context changes with TEX0_1 • TEX0_1 only takes 2 GS cycles if CLUT isn’t loaded and texture address isn’t changed • TEX2_1 (CLUT) is no quicker than TEX0_1 it just masks some of the TEX0_1 fields

  27. CLUTs • Loading a new CLUT causes 2 things to happen • New CLUT must be loaded • Texture cache is invalidated • Loading a just a CLUT is no faster than loading both CLUT and TEXTURE • However selecting an already loaded CLUT is a zero cost operation.

  28. Fill-rates : Summary • Texture page caches have the biggest effect on fill rate • Subdivide large texture co-ordinate ranges • Keep mip-maps in the same page • Texture reduction also costs fill rate as texel read becomes bottle neck • Frame buffer pages misses aren’t too bad • Cost for big polygons is not bad compared to texture penalties

  29. Making the most of VRAM • 4bit, 8bit palletised are the most compact • Tiled textures with repeat and region repeat • Multi-pass techniques • Alpha blending is zero cost • Useful for multi-pass techniques • Useful blend types • Standard blend between SRC and FRAME • Multiply blend (using alpha channel)

  30. Tiling textures • Very easy way to add detail for little cost • Repeat range • 0.10.4 UV (0 - 1024) • 1.11.4 ST (+- 2048) which is 4x the range • Number of repeats reduces for larger textures • Watch out when scissoring massively tiled polygons • Perspective errors • Recalculate smaller texture co-ordinates

  31. Texture Compression • Monochrome textures can compress really well to 4bit

  32. Texture Compression • The eye is sensitive to gradual changes in luminance so palettes bad look in this case • In this case it would be better to reduce in size and use GS bilinear filter to interpolate

  33. Texture Compression • You can add a low bit depth detail map to a low resolution interpolated image • Total size of the 2 images is much less than a single 24bit image. We can also use tiling.

  34. Colour map 1/16 area of original. 8-bit CLUT up to 32-bit Detail map full-size 2-bit or 4-bit grayscale 2 Pass Texture Compression Original 24-bit or 32-bit image

  35. Texture Compression 2.0 • Detail map CLUT is concentrated around the centre • Eye is sensitive to small changes in luminance. 1.0 0.0 • Detail map is calculated as: • original pixel / colour map pixel = alpha multiply which is then mapped to a CLUT.

  36. CLUT 1 x x 0 0 x x 0 1 x x 1 0 x x 1 1 CLUT 2 0 0 x x 0 1 x x 1 0 x x 1 1 x x 2-bit Luminance Textures 4-bit image

  37. Texture Compression • Decompressing the texture • Draw low resolution colour map normally • Draw detail map with alpha multiply • Two alternatives for detail map drawing • Decompress to a new texture first • Draw directly using two passes • Colour map can serve as a low-res mipmap • Detail map can be faded in for close ups • Benefit is reduced GIF->GS data transfer

  38. Interlace Flickering • For high-resolution you need to run the TV interlaced • Odd and Even lines are drawn alternate frames • Any image not drawn on both lines flickers • Scan line blending solves the problem • This flickering is much more more of a problem than edge aliasing.

  39. Interlace Flickering - Solutions • Choose appropriate mip-map textures • For games not guaranteed to run in a frame • Use 2 circuit method (very easy) • If you can run in a frame you can save some VRAM compared to the 2circuit method • Sprite method: Saves 1/2 a display buffer • Motion blur method: Save all VRAM • 2pass method: Save all VRAM but 2x polygons

  40. Super-sampling techniques and edge Anti-aliasing • Edge anti-aliasing is nice but you must sort your polygons and it’s slower to draw • Down sample is easy but expensive in VRAM • Draw objects to large off-screen buffers and down-sample (we can still Z test if we scale up Z first) • An alternative method • Render 4x with 25% alpha and 1/2 pixel offset in 4 directions. Same effect using extra polygons rather than VRAM

  41. One last thing - Loading screens and framing out • Framing out on loading • Use field mode perhaps • You could use 16bit field mode in the z buffer? • Use a low res background with 2nd circuit text?

  42. Summary • Maximising GS input paths • Transfer textures as 32bit • Consider detail textures and texture tiling • Keeping up fill-rates • Subdivide textures to within caches • Don’t reduce textures • Make use of LOD to avoid <1pixel area triangles • Watch out for penalties on Fog and Mip-maps

More Related