1 / 31

Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Carnegie Trust for the Universities of Scotland. Efficient Dynamic Heap Allocation of Scratch-Pad Memory. Ross McIlroy, Peter Dickman and Joe Sventek. Scratch-Pad Memory Allocator. SMA: A dynamic memory allocator targeting extremely small memories (< 1MB in size)

vlad
Download Presentation

Efficient Dynamic Heap Allocation of Scratch-Pad Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Carnegie Trust for the Universities of Scotland Efficient Dynamic Heap Allocationof Scratch-Pad Memory Ross McIlroy, Peter Dickman and Joe Sventek

  2. Scratch-Pad Memory Allocator SMA: A dynamic memory allocator targeting extremely small memories (< 1MB in size) • Why target such tiny memories? • Why provide dynamic memory allocation for such small memories?

  3. Outline • Rational for SMA • SMA Approach • Results • Concurrent SMA • Conclusion / Future work

  4. Outline • Rational for SMA • SMA Approach • Results • Concurrent SMA • Conclusion / Future work

  5. What Tiny Memories? • Embedded Systems • Sensor Network Motes • Vehicular Devices • Scratch-Pad Memories • Network Processors • Heterogeneous Multi-Core Processors

  6. Scratch-Pad Memories • Memory structured as a hierarchy • Small fast memories, large slow memories • Usually hidden by hardware caches • Some processor architectures employ scratch-pad memories instead • Similar size and speed as caches, but explicitlyaccessible by software • Examples • IBM Cell processor • Intel IXP network processors • Intel PXA mobile phone processors

  7. Why Dynamic Management? • Developers want as much useful data in the fast Scratch-Pad memory as possible • They don’t want to deal with the fragmented memory hierarchy

  8. Why SMA? Managing 4kB Scratch-Pad memory on an Intel IXP processor

  9. Outline • Rational for SMA • SMA Approach • Results • Concurrent SMA • Conclusion / Future work

  10. Basic Approach • By default represent memory coarsely as a series of fixed size blocks • Can employ a very simple bitmap based allocation / free algorithm • When required, split blocks into variable sized regions • Prevents excessive internal fragmentation

  11. Large Block Allocation • Each block in memory represented by a bit in a free-block bitmap 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 1 0 1 rem_blocks = blocks_bm & ~mask; next_pos = ffs(rem_blocks); in_use = mask & ~blocks_bm; next pos = fls(in_use) + 1;

  12. Small Region Allocation • Unused parts of an allocated block can be reused by sub-block sized allocations • Blocks are split into power of two sized regions, in a Binary Buddy type approach • Free regions are stored in per-size free lists

  13. Coalescing Freed Regions • We wanted to avoid boundary tags • Instead the orderly way in which regions are split is exploited • A word sized coalesce tag stores the coalesce details for all regions in a block 1

  14. Deferred Coalescing • SMA (CAM) • Any size can have coalescing deferred • Content addressable memory used to associate thesize of deferred coalesced regions with the regionsthemselves • SMA (LM) • Sizes which coalescing can be deferred chosen atcompile time • Deferred regions stored in an array in local memory

  15. Outline • Rational for SMA • SMA Approach • Results • Concurrent SMA • Conclusion / Future work

  16. Experimental Setup • Intel IXP 2350 • Network processor • 4 microengine cores with 4kB local scratch-pad each • Access to another 16kB of shared scratch-pad • Compared against Doug Lea’s malloc

  17. Allocation Performance

  18. Free Performance

  19. Memory Wastage

  20. Memory Wastage

  21. Outline • Rational for SMA • SMA Approach • Results • Concurrent SMA • Conclusion / Future work

  22. Lock-Free Block Allocation • State for large blocks is stored in the free-block bitmap • A simple lock-free update algorithm can be used to protect this bitmap • Uses the test and clear primitive 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 0 0 0 Global Test & Clear Test & Clear Atomic Set 0 0 Thread 1 Thread 2

  23. Protecting Small Region Lists • Locks are used to protect the free-lists used for small size allocation • SMA Coarse uses one lock • SMA Fine uses one lock per size class • In SMA Fine, when regions are being coalesced, two locks must be held briefly

  24. Concurrency Scaling

  25. Outline • Rational for SMA • SMA Approach • Results • Concurrent SMA • Conclusion / Future work

  26. Future Work • Provide the illusion of a single memory • Let runtime worry about data placement • Data can be annotated to give hints to the runtime system

  27. Conclusion • Tiny memories need to be managed too • SMA is a simple and efficient algorithm for dynamic management of small memories • Fixed size block allocation is simple and has low state overheads • Splitting partially used blocks to be reused by small allocations limits fragmentation • SMA can be augmented to support concurrent requests from multiple cores

  28. Questions?

  29. 16kb Management Allocation

  30. 16kB Management Free

  31. 16kB Management Waste

More Related