1 / 29

Avoiding Initialization Misses to the Heap

Avoiding Initialization Misses to the Heap. Jarrod Lewis, Bryan Black, and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison Intel Labs. http://www.ece.wisc.edu/~pharm. Motivation. Memory bandwidth is expensive

Download Presentation

Avoiding Initialization Misses to the Heap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Avoiding Initialization Misses to the Heap Jarrod Lewis, Bryan Black, and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison Intel Labs http://www.ece.wisc.edu/~pharm

  2. Motivation • Memory bandwidth is expensive • Shouldn’t waste on useless traffic • Can be put to better use • Multithreading, prefetching, MLP, etc. • Search and destroy useless traffic • Focus of this talk: heap initialization • Detect and optimize initialization of newly allocated memory 23% of misses in 2MB cache are invalid Avoiding Initialization Misses to the Heap – Mikko Lipasti

  3. Dynamically Allocated Memory Allocated Invalid • Invalid memory need not be transferred • Provide interface that expresses this directly? malloc() Unallocated Invalid Heap Space free() initializing store free() Allocated Valid load or store Avoiding Initialization Misses to the Heap – Mikko Lipasti

  4. Talk Outline • Motivation • Analysis of Heap Behavior • Detecting Initializing Writes • Performance Analysis • Conclusions Avoiding Initialization Misses to the Heap – Mikko Lipasti

  5. Allocation Analysis • Two main modes • Single dominant allocation (up to 100MB) or • Numerous moderate allocations • Initialization of allocations • 88% initialized with store miss • Little temporal reuse of free’d memory • Phase behavior • Start of program often dominates • Even SPEC has counterexamples (gcc, vortex) Avoiding Initialization Misses to the Heap – Mikko Lipasti

  6. Cache Miss Behavior • Init stores cause up to 60% of misses (avg 23%) • These are 35% of all compulsory misses Avoiding Initialization Misses to the Heap – Mikko Lipasti

  7. Talk Outline • Motivation • Analysis of Heap Behavior • Detecting Initializing Writes • Performance Analysis • Conclusions Avoiding Initialization Misses to the Heap – Mikko Lipasti

  8. Detecting Initializing Writes • Annotate malloc() • Record base, size in allocation range cache • Key questions • What is working set? • How are ranges represented? • Valid bits? Not scalable for 100M allocation • Base + bound • How are ranges updated on writes? • Split vs. truncate Avoiding Initialization Misses to the Heap – Mikko Lipasti

  9. Allocation Working Set • 4-8 entries sufficient, except parser needs 64 Avoiding Initialization Misses to the Heap – Mikko Lipasti

  10. Sequential Initialization • Forward sweep captures 90%+ except • Bzip, gzip, perl Allocated-Invalid Initialization Tracking Initialized Pattern Scheme Unknown 1. Sequential 1. Forward Sweep A A B C D E F B C D E F B B A C D E F A C D E F C C A B D E F A B D E F D D A B C E F A B C E F Avoiding Initialization Misses to the Heap – Mikko Lipasti

  11. Alternating Initialization • Bidirectional captures 90%+ of perl • Doesn’t help bzip or gzip Allocated-Invalid Initialization Tracking Initialized Pattern Scheme Unknown 2. Alternating 2. Bidirectional Sweep A A B C D E F B C D E F F F A B C D E A B C D E B B A C D E F A C D E F E E A B C D F A B C D F Avoiding Initialization Misses to the Heap – Mikko Lipasti

  12. Striding Initialization • Interleaving captures 90%+ of gzip • Still only 60% of bzip • Bzip has a large allocation with random initialization Allocated-Invalid Initialization Tracking Initialized Pattern Scheme Unknown 3. Striding 3. Interleaving A A B C D E F C E B D F C C A B D E F A E B D F E E A B C D F A C B D F B B A C D E F A C E D F Avoiding Initialization Misses to the Heap – Mikko Lipasti

  13. Talk Outline • Motivation • Analysis of Heap Behavior • Detecting Initializing Writes • Performance Analysis • Conclusions Avoiding Initialization Misses to the Heap – Mikko Lipasti

  14. SimOS-PPC -AIX 4.3.1 -Disk driver -E’net driver Block Simple PharmSim -OOO Core -Gigaplane Ethernet PharmSim Overview • Device simulation, etc. from SimOS-PPC [IBM ARL] • PharmSim replaces functional simulators • Full OOO core model, values in rename registers • Supports priv. mode, MMU, TLB, exceptions, interrupts, barriers, flushes, etc. • Lead developer: Trey Cain (thanks Trey!) Avoiding Initialization Misses to the Heap – Mikko Lipasti

  15. Operating System Effects • Widely accepted for SPECINT: • Safe to ignore O/S paths • Most popular tool (Simplescalar) • Intercepts system calls • Emulates on host, updates “flat” memory • Returns “magically” with cache contents intact • We have found that [CAECW2002]: • Omitting system references leads to dramatic error (5.8x L2 miss rate, 100% IPC in worst case) • Specifically, AIX page fault handler eliminates many initializing write misses • Had we not used PHARMsim? • Dramatically overstated performance benefit Avoiding Initialization Misses to the Heap – Mikko Lipasti

  16. AIX Page Installation • Heap manager calls sbrk • Heap manager calls sbrk • Malloc returns block < 4KB • Heap manager calls sbrk • Malloc returns block < 4KB • Program writes to block • Heap manager calls sbrk • Malloc returns block < 4KB • Program writes to block • First reference causes page fault • Heap manager calls sbrk • Malloc returns block < 4KB • Program writes to block • First reference causes page fault • AIX installs entire page using dcbz Unallocated Unallocated Allocated Valid Data segment Avoiding Initialization Misses to the Heap – Mikko Lipasti

  17. Block vs. Page Installation • Page installation • Practically free as part of page fault • Shortcomings of page installation • Pollutes cache • Not scalable to superpages (AIX v5.1) • Does not work for heap reuse • Our short simulations don’t show this benefit • I.e. high overlap between initializing writes and first reference to extended data segment Avoiding Initialization Misses to the Heap – Mikko Lipasti

  18. Integrating ARC Avoiding Initialization Misses to the Heap – Mikko Lipasti

  19. Speedup • Very aggressive core model • Still can’t tolerate all store miss latency • Block mode slightly better than page mode • Cache pollution, less coverage Avoiding Initialization Misses to the Heap – Mikko Lipasti

  20. Program Phase Behavior • Only benefits initialization program phase • Some programs initialize throughout execution Avoiding Initialization Misses to the Heap – Mikko Lipasti

  21. Conclusions • Initializing writes • Cause 23% of all misses in 2MB L2 • Avoid miss with block or page mode install • Up to 41% performance improvement • Subject to initialization:computation ratio • Tracking allocation ranges • Working set very small (4-8, 64) • Forward/bidirectional/interleaved sweep enables range truncation Avoiding Initialization Misses to the Heap – Mikko Lipasti

  22. Acknowledgments • Originated as course project: • Gordie Bell, Trey Cain, Kevin Lepak • PHARMsim infrastructure • Lead developer: Trey Cain • Financial and equipment support • IBM and Intel Corp • National Science Foundation • University of Wisconsin Avoiding Initialization Misses to the Heap – Mikko Lipasti

  23. Questions? Avoiding Initialization Misses to the Heap – Mikko Lipasti

  24. Backup Slides Avoiding Initialization Misses to the Heap – Mikko Lipasti

  25. Invalid Memory Traffic • Real data traffic that transfers invalid data • Initializing Store • Initial write to a storage location that contains invalid data Avoiding Initialization Misses to the Heap – Mikko Lipasti

  26. Allocation Analysis • Single dominant allocation vs. • Numerous moderate allocations Avoiding Initialization Misses to the Heap – Mikko Lipasti

  27. Initialization of Heap • 88% initialized by store miss • Relatively little temporal reuse of freed memory Avoiding Initialization Misses to the Heap – Mikko Lipasti

  28. Fetch Translate Decode Execute Mem Commit PharmSim Pipeline • Substantially similar to IBM Power4 • Some instructions “cracked” (1:2 expansion) • Others (e.g. lmw) microcode stream • Mem Stage • Interface to 2-level cache model • Sun Gigaplane XB snoopy MP coherence • Caches contain values, must remain coherent • No cheating! • No “flat” memory model for reference/redirect Avoiding Initialization Misses to the Heap – Mikko Lipasti

  29. Machine Model Unrealistically aggressive model to devalue the impact of store misses. • 8-wide, 6-stage pipeline • 8K entry combining predictor • 128 RUU, 64 LSQ entries, 64 write buffers • 256KB 4-way associative L1D cache • 64KB 2-way associative L1I • 2MB 4-way associative L2 unified cache • All cache blocks are 64 bytes • L2 latency is 10 cycles • Memory latency is 70 cycles. Avoiding Initialization Misses to the Heap – Mikko Lipasti

More Related