1 / 26

NDA

Inside Xbox One Martin Fuller Xbox Advanced Technology Group AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM. NDA. This is a non-NDA event That means there is a limit to how much I can say, go easy!. CPU. AMD Jaguar (x64 ) - 8-cores arranged in 2x clusters of 4 cores each

sancha
Download Presentation

NDA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inside Xbox OneMartin FullerXbox Advanced Technology GroupAMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM

  2. NDA • This is a non-NDA event • That means there is a limit to how much I can say, go easy!

  3. CPU AMD Jaguar (x64) - 8-cores arranged in 2x clusters of 4 cores each • 1.75 GHz • Dual issue • Out of order execution • Speculative execution • Store-to-load forwarding • SSE4.2 and AVX • (Dot product!) • 16 x 256-bit wide floating point registers • Hardware pre-fetch

  4. Memory • 8 GiB of DDR3 at 68 GiB/s • Low latency • Not enough bandwidth to touch all of memory a frame, RAM as a super fast cache • 48-bit virtual address space • 256 terabytes • Tricky to fragment! • Synced between CPU and GPU • 4 MiBof L2 cache • 2 MiBper cluster • MOESI protocol for cache coherency • 16-way set associative • Per core, up to eight cache requests in flight at once

  5. CPU – Recommendations • Store to load forwarding saves the dreaded LHS stall • But not spilling out registers is even better • The branch predictor is not a crystal ball • Branchless tricks learnt in Xbox 360 era can still apply • Hardware data pre-fetch is awesome • Only works with arrays • Avoid aliasing load/stores on 2KiB alignments • This causes a false positive that delays load execution • Go wide with SSE and leverage all cores • No brainer

  6. GPU • AMD GCN 768-SPU • 853 MHz • 32 MiB of ESRAM at 109 GiB/s • 4 Move Engines • 3 hardware display planes • Resolution independent • Frame rate independent • Exact sRGB this time! • (oh, and its free) • Hardware video encode and decode • HDMI 1.4a in and out

  7. Move Engines • More than just DMA copy • Memory set • Texture swizzle • JPEG decompress • LZ compress and decompress

  8. ESRAM • 32MiB of general purpose RAM • Not like EDRAM on Xbox 360 • 109 GiB/s • Sometimes faster in practice! • Zero contention • Not shared with CPU, SRA’s or video out • ESRAM makes everything better • Render targets • Textures • Geometry • Compute tasks

  9. ESRAM – Sometimes faster in practice? • ESRAM can handle concurrent read/writes: Increasing effective bandwidth above 109 GiB/s • Operations that can take advantage of this: • Read modify write operations • Depth buffer / HTILE update • Alpha blending • Oh, and concurrently DMA’ing resources in/out of ESRAM while also rendering • How much effective bandwidth can titles achieve? • The current record holder achieved 141 GiB/s from ESRAM (this is a post processing pass in a real title) • Of course all titles combine ESRAM’s >= 109 GiB/s with DRAM’s 68 GiB/s

  10. ESRAM – The Four Stages of Adoption • Statically allocate a small number of render targets in ESRAM • Alias the same memory for re-use later • Partial residency • Put the top strip of render targets (sky) in DRAM, the rest in ESRAM • Asynchronously DMA resources in/out of ESRAM • Launch titles were at 1 - 2 • 2nd wave of titles are now starting to tackle points 3 and/or 4 • 3rd+ wave will get really good at this!

  11. ESRAM – Memory Maps! • It’s like 8 bit days all over again! (Sort of) • Plan the asynchronous moves • Move resources in/out asynchronous while also rendering • New memory map at each stage of the render pipeline • Don’t forget, swizzle textures on DMA

  12. Maxing out the GPU • Are you bandwidth limited? • Have you maxed out the fixed function hardware? • Do you have spare compute resource? • Then use async compute! • Titles have barely scratched the surface yet: • Watch this space!

  13. The usual GPU recommendations • Use ESRAM • First for depth / stencil • Then colour targets • Then everything else • Sort by state / shader / use hardware instancing • (Batch batch batch!) • Always swizzle textures • Be wary of using too many general purpose registers • Keep an eye on occupancy in PIX, we normally recommend >= 4 • Avoid reading DRAM via the CPU-coherent bus • There is no hardware integer divide

  14. Graphics API • DX11 was designed for the desktop (a long time ago, 2008!) • Abstracts a variety of different GPU architectures • Manages VRAM residency for you • Over subscribing VRAM is a serious performance pitfall • Handles hazards • Developers can handle these at a higher level => less cost • Xbox One will run vanilla DX11 PC code • Easy port • Extensions available for low level access

  15. Graphics API • DX11.X • Some DX12 features available right now on Xbox: • Turn off hazard tracking • Simple fence API • Deferred contexts re-implemented • New resource descriptor model • Draw bundles • (Xbox specific, not the DX12 API)

  16. DRAM - Contention • The CPU cannot saturate DRAM bandwidth on its own, the GPU can! • Significant performance degradation from DRAM contention • Fancy CPU features don’t help if memory starved • 10. Use ESRAM as much as possible 20. Leave DRAM for the CPU and DMA • 30. goto 10;

  17. DRAM – Love your bandwidth • Hardware data cache pre-fetch units are awesome • Manual pre-fetch is near pointless once hardware pre-fetch is spinning • Wasting bandwidth if only operating on small arrays • Write combined memory pages and SSE streaming store instructions by-pass the cache • No load - halves the bandwidth consumed by the CPU • Pack your data! • Expanding / compressing data is cheap (CPU & GPU) • F16C (half <-> float) CPU instructions • Store to load forwarding avoids LHS stalls • Swizzle your textures • Move engines can swizzle on copy

  18. Audio • Custom audio hardware • Very fast • Lots of features • Kinda cool! • Nuff said

  19. 3x Operating Systems • ERA • Exclusive Resource Allocation • Only one active at a time • Custom OS • (Games!) • SRA • Shared Resource Allocation • Win8 core • (Apps) • Hypervisor • SRA and ERA use different virtual address space

  20. PLM (Program Lifetime Management) • ERA can be in one of several states • Full screen • Full resources (even with snapped app up) • Constrained (Windowed) • Slightly less CPU and GPU resource • No input • Same amount of memory • Suspended • Zero CPU and GPU resource • No input • Same amount of memory • Limited time to save after receiving a suspend message

  21. Kinect 2.0 • Hardware: • Higher resolution colour and depth • Better ranges • New – infrared! • Microphone array • No tilt motor • Software: • Improved skeletal tracking • Improved biometrics

  22. Streaming install • 6x Bluray = ~26 MiB/s • To install a 50 GiBBluray at ~26 MiB/s = ~33 minutes • Too long to wait… bored now… • Game must start after an initial payload has been installed. • When running title can hint as to what to install next. • No direct access to Bluray. • Could be digital download • It’s obvious but I’ll say it anyway – compress you assets!

  23. The Cloud • Cloud compute: • Developer’s code is hosted and executed in Windows Azure • Game code execution automatically scales based upon usage • Live services: • Stats, analytics, matchmaking & storage. • Secure!

  24. Challenges • Is your code 64-bit compliant? • Can you scale to 6 cores? • Adopt new DX11.X API extensions • Manage your own resource hazards • Make sure you use ESRAM effectively • Package content for streaming install • Game design considerations • Quick save on ERA termination • Kinect, Smartglass • Cloud services

  25. Thank You! – Questions? • (That I’m allowed to answer)

More Related