130 likes | 323 Views
Memory ReStructuring. From a memory/cache perspective, application behavior depends on: Memory structure Input data Data structures Compiler can determine some of this behavior. Dynamic monitoring and re-writing of memory (data structures and data layout) can buy us much more.
E N D
Memory ReStructuring • From a memory/cache perspective, application behavior depends on: • Memory structure • Input data • Data structures • Compiler can determine some of this behavior. • Dynamic monitoring and re-writing of memory (data structures and data layout) can buy us much more.
Object Level Reorganization Data/node layout A B C Unit of Banking D E F G H I
Object Level Reorganization Data/node layout A A Hot Path B C Unit of Banking Unit of Banking D E F G H I
Field Level Reorganization Transformation Unit of banking Assumes that each record can be split into HOT and COLD field
Hotness Total Elements: 10,000
Hotness Total Elements: 100,000
Hotness Total Elements: 50,000
Related Work • Improving Cache Performance in Dynamic Applications through Data and Computation Reorganization at Run Time • Chen Ding and Ken Kennedy – Rice University • PLDI ’99 and Ding’s Thesis • Reorders memory access on arrays at runtime via a library call • Creates mapping from original position to new position • Use compiler optimization to help remove mapping overhead • Heuristic-guided memory packing • Eliminates 97-99% of cache misses with locality grouping • Reduces 21%-84% L2 misses with dynamic data packing
Related Work • Memory Hierarchy Management for Iterative Graph Structures • Ibraheem Al-Furaih – Syracuse University and Sanjay Ranka – University of Florida • 12th Intl. Parallel Processing Symposium (1998) • Partition graph into sets that can fit into cache • And/or layout cache in a breadth-first manner • Applied as a preprocessing step • Speedups range from 1.2 to 1.5 times
Related Work • Data Remapping for Design Space Optimization of Embedded Memory Systems • Rodric M. Rabbah and Krishna V. Palem – Georgia Institute of Technology • ACM Trans. On Embedded Computing Systems (May 2003) • Generic, pointer-centric data reorganization done at compile time • Use profile data of access patterns along program’s hot path • Splits global arrays of structs into parallel arrays • Intercept dynamic data allocation requests and pool results together • 20% performance improvement on average