memory management strategies master class n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Game Connection 2012 PowerPoint Presentation
Download Presentation
Game Connection 2012

Loading in 2 Seconds...

play fullscreen
1 / 134

Game Connection 2012 - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Memory Management Strategies Master Class. Game Connection 2012. About myself. Studied computer science at VUT, Austria Working in the games industry since 2004 PC, XBox360, PS2, PS3, Wii, DS Specialization in low-level programming (threading, debugging, optimization) Teaching

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Game Connection 2012


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Memory Management Strategies Master Class Game Connection 2012

    2. About myself • Studied computer science at VUT, Austria • Working in the games industry since 2004 • PC, XBox360, PS2, PS3, Wii, DS • Specialization in low-level programming (threading, debugging, optimization) • Teaching • Founder & CTO @ Molecular Matters • Middleware for the games industry

    3. Master class • Participation • Exchange of experiences • Discussion • There is no perfect way of doing things • There are many „rights“ & „wrongs“ • Let us talk about past experiences, mistakes, improvements • Share ideas! • Ask questions!

    4. Agenda • C++ new/delete/placement syntax • Virtual memory • Allocators • Allocation strategies • Debugging facilities • Fill patterns • Bounds checking • Memory tracking

    5. Agenda (cont'd) • Custom memory system • Relocatable allocations • Run-time defragmentation • Debugging memory-related bugs • Stack overflow • Memory overwrites

    6. C++ new/delete/placement syntax

    7. What's wrong with that? • void* operator new(size_t size, unsigned int align){ // align memory by some means return _aligned_malloc(size, align);}NonPOD* nonPod = new (32) NonPOD;NonPOD* nonPodAr = new (32) NonPOD[10]; • Addresses of nonPod and array?

    8. C++ new • How do we allocate memory? • Using the new operator (keyword new) • T* instance = new T; • What happens behind the scenes? • Calls operator new to allocate storage for a T • Calls the constructor for non-POD types

    9. C++ delete • How do we free memory? • Using the delete operator (keyword delete) • delete instance; • What happens behind the scenes? • Calls the destructor for non-POD types • Calls operator delete to free storage

    10. C++ new, placement syntax • Keyword new supports placement syntax • Canonical form called placement new • Calls operator new(size_t, void*) • Returns the given pointer, does not allocate memory • Constructs an instance in-place • T* instance = new (memory) T; • Destructor needs to be called manually • instance->~T();

    11. C++ new, placement syntax (cont'd) • Placement syntax supports N parameters • The compiler maps keyword new to the corresponding overload of operator new • T* instance = new (10, 20, 30) T;calls void* operator new(size_t, int, int, int); • First argument must always be of type size_t • sizeof(T) is inserted by the compiler

    12. C++ new, placement syntax (cont'd) • Very powerful! • Custom overloads for operator new • Each overload must offer a corresponding operator delete • Can store arbitrary arguments for each call to new • An operator is just a function • Can be called directly if desired • Needs manual constructor call using placement new • Can use templates

    13. C++ delete, placement syntax • Keyword delete does not support placement syntax • delete (instance, 10, 20); • Treated as a statement using the comma operator • Overloads of operator delete used when an exception is thrown upon a call to new • Overloads can also be called directly • Needs manual destructor call

    14. C++ new[] • Creates an array of instances • Similar to keyword new, calls operator new[] • Calls the constructor for each non-POD instance • Supports placement syntax • Custom overloads of operator new[] possible • First sizeof() argument is compiler-specific • POD vs. non-POD!

    15. C++ new[] (cont'd) • For non-PODs, constructors are called • delete[] needs to call destructors • How many destructors to call? • Compiler needs to store the number of instances • Most compilers add an extra 4 bytes to the allocation size • sizeof(T)*N + 4 (non-POD)sizeof(T)*N (POD)

    16. C++ new[] (cont'd) • Important! • Address returned by operator new[] != address to first instance in the array • Source of confusion • Compiler-specific behaviour, makes it almost impossible to call overloads of operator delete[] directly • Do we need to go back 4 bytes or not? • Makes support for custom alignment harder

    17. C++ delete[] • Deletes an array of instances • Similar to keyword delete, calls operator delete[] • Calls the destructors for each non-POD instance in reverse order • Again, POD vs. non-POD • Number of instances to destruct is stored by the compiler for non-POD types

    18. C++ new vs. delete mismatch • Allocating with new, deleting with delete[] • operator delete[] expects the number of instances • May crash • Allocating with new[], deleting with delete • More subtle bugs, only one destructor will be called • Visual Studio heap implementation is smart enough to detect both mismatches

    19. Summary • new != operator new • delete != operator delete • new[]/delete[] are compiler-specific • Never mix new/delete[] and new[]/delete • new offers powerful placement syntax

    20. Virtual memory

    21. Virtual memory • Each process = virtual address space • Not to be confused with paging to hard disk • Virtual memory != physical memory • Address translation done by MMU • OS allocates/reserves memory in pages • Page sizes: 4KB, 64KB, 1MB, ...

    22. Virtual memory (cont'd) • Virtual addresses are mapped to physical memory addresses • Contiguous virtual addresses != contiguous physical memory • A single page is the smallest amount of memory that can be allocated • Access restrictions on a per-page level • Read, write, execute, ...

    23. Virtual memory (cont'd) • Simplest address translation: • Virtual address = page directory + offset • Page directory = physical memory page + additional info • Page directory entries set by OS • In practice: Multi-level address translation • See „What every programmer should know about memory“ by Ulrich Drepper • http://lwn.net/Articles/253361/

    24. Virtual memory (cont'd) • Address translation is expensive • Several accesses to memory • „Page walk“ • Result of address translation is cached • Translation Look-aside Buffer (TLB) • Multiple levels, like D$ or I$ • TLB = Global resource per processor

    25. Virtual memory (cont'd) • Allows to allocate contiguous memory even if the physical memory is not contiguous • Available on many architectures (PC, Mac, Linux, almost all consoles) • Used by CPU only • GPU, sound hardware, etc. needs contiguous physical memory • E.g. XPhysicalAlloc

    26. Virtual memory (cont'd) • Growing allocators can account for worst-case scenarios more easily when using VM • Different address ranges for different purposes • Heap, stack, code, write-combined, ... • Helps with debugging!

    27. Summary • Virtual memory nice to have, but not a necessity • Can help tremendously with debugging • Virtual memory made availabe to CPU, not GPU or other hardware • Virtual memory address range >> RAM

    28. Allocators

    29. Why different allocators? • No silver bullet, many allocation qualities • Size • Fragmentation • Wasted space • Performance • Thread-safety • Cache-locality • Fixed size vs. growing

    30. Common allocators • Linear • Stack, double-ended stack • Pool • Micro • One-frame, two-frame temporary • Double-buffered I/O • General-purpose

    31. Linear allocator • + Supports any size and alignment • + Extremely fast, simply bumps a pointer • + No fragmentation • + No wasted space • + Lock-free implementation possible • + Allocations live next to each other • - Must free all allocations at once

    32. Stack allocator • + Supports any size and alignment • + Extremely fast, simply bumps a pointer • + No fragmentation • + No wasted space • + Lock-free implementation possible • + Allocations live next to each other • +/- Must free allocations in reverse-order

    33. Double-ended stack allocator • Similar to stack allocator • Can allocate from bottom or top • Bottom for resident allocations • Top for temporary allocations • Mostly used for level loading

    34. Pool allocator • - Supports one allocation size only • + Very fast, simple pointer exchange • + Fragments, but can always allocate • + No wasted space • + Lock-free implementation possible • - Holes between allocations • + Memory can be allocated/freed in any order

    35. Pool allocator (cont'd) • In-place free list • No extra memory for book-keeping • Re-use memory of freed allocations • Point to next free entry

    36. Micro allocator • Similar to pool allocator, but different pools for different sizes • + Very fast, lookup & simple pointer exchange • + Fragments, but can always allocate • - Some wasted space depending on size • + Can use pool-local critical sections / lock-free • - Holes between allocations • + Memory can be allocated/freed in any order

    37. One-frame temporary allocator • Similar to linear allocator • Used for scratchpad allocations during a frame • Another alternative is to use stack memory • Fixed-size • alloca()

    38. Two-frame temporary allocator • Similar to one-frame temporary allocator • Ping-pong between two one-frame allocators • Results from frame N persist until frame N+1 • Useful for operations with 1 frame latency • Raycasts

    39. Double-buffered I/O allocator • Two ping-pong buffers • Read into buffer A, consume from buffer B • Initiate reads while consuming • Useful for async. sequential reads from disk • Interface offers Consume() only • Async. reads done transparently & interleaved

    40. General-purpose • Must cope with small & large allocations • Used for 3rd party libraries • Properties • - Slow • - Fragmentation • - Wasted memory, allocation overhead • - Must use heavy-weight synchronization

    41. General-purpose (cont'd) • Common implementations • „High Performance Heap Allocator“ in GPG7 • Doug Lea's „dlmalloc“ • Emery Berger's „Hoard“

    42. Growing allocators • With virtual memory • Reserve worst-case up front • Backup with physical memory when growing • Less hassle during development • Can grow without relocating allocations • Without virtual memory • Resize allocator for e.g. each level • Needs adjustment during development

    43. Allocators • Separate how from where • How = allocator • Where = heap or stack • Offers more possibilities • Allows to use stack with different allocators

    44. Summary • No allocator fits all purposes • Each allocator has different pros/cons • Ideally, for each allocation think about • Size • Frequency • Lifetime • Threading

    45. Allocation strategies

    46. Why do we need a strategy? • Using a general-purpose allocator everywhere leads to • Fragmented memory • Wasted memory • Somewhat unclear memory ownership • Excessive clean-up before shipping • We can do better!

    47. Decision criteria • Lifetime • Application lifetime • Level lifetime • Temporary

    48. Decision criteria (cont'd) • Purpose • Temporary while loading a level • Temporary during a frame • Purely visual (e.g. bullet holes) • LRU scheme • Streaming I/O • Gameplay critical

    49. Decision criteria (cont'd) • Frequency • Once • Each level load • Each frame • N times per frame • Should be avoided in the first place

    50. Where would you put those? • Application-wide singleton allocations • Render queue/command buffer allocations • Level assets • Particles • Bullets, collision points, … • 3rd party allocations • Strings