1 / 22

Memory Hierarchy Adaptivity An Architectural Perspective

Memory Hierarchy Adaptivity An Architectural Perspective. Alex Veidenbaum AMRM Project sponsored by DARPA/ITO. Opportunities for Adaptivity. Cache organization Cache performance “assist” mechanisms Hierarchy organization Memory organization (DRAM, etc) Data layout and address mapping

ggoldstein
Download Presentation

Memory Hierarchy Adaptivity An Architectural Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Hierarchy AdaptivityAn Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

  2. Opportunities for Adaptivity • Cache organization • Cache performance “assist” mechanisms • Hierarchy organization • Memory organization (DRAM, etc) • Data layout and address mapping • Virtual Memory • Compiler assist

  3. Opportunities - Cont’d • Cache organization: adapt what? • Size: NO • Associativity: NO • Line size: MAYBE, • Write policy: YES (fetch,allocate,w-back/thru) • Mapping function: MAYBE

  4. Opportunities - Cont’d • Cache “Assist”: prefetch, write buffer, victim cache, etc. between different levels. • Adapt what? • Which mechanism(s) to use • Mechanism “parameters”

  5. Opportunities - Cont’d • Hierarchy Organization: • Where are cache assist mechanisms applied? • Between L1 and L2 • Between L1 and Memory • Between L2 and Memory • What are the data-paths like? • Is prefetch, victim cache, write buffer data written into the cache? • How much parallelism is possible in the hierarchy?

  6. Opportunities - Cont’d • Memory Organization • Cached DRAM? • Interleave change? • PIM

  7. Opportunities - Cont’d • Data layout and address mapping • In theory, something can be done but… • MP case is even worse • Adaptive address mapping or hashing based on ???

  8. Opportunities - Cont’d • Compiler assist • Can select initial configuration • Pass hints on to hardware • Generate code to collect run-time info and adjust execution • Adapt configuration after being “called” at certain intervals during execution • Select/run-time optimize code

  9. Opportunities - Cont’d • Virtual Memory can adapt • Page size? • Mapping? • Page prefetching/read ahead • Write buffer (file cache) • The above under multiprogramming?

  10. Applying Adaptivity • What Drives Adaptivity? Performance impact, overall and/or relative • “Effectiveness”, e.g. miss rate • Processor Stall introduced • Program characteristics • When to perform adaptive action • Run time: use feedback from hardware • Compile time: insert code, set up hardware

  11. Where to Implement • In Software: compiler and/or OS • (Static) Knowledge of program behavior • Factored into optimization and scheduling • Extra code, overhead • Lack of dynamic run-time information • Rate of adaptivity • requires recompilation, OS changes

  12. Where to Implement - Cont’d • Hardware • dynamic information available • fast decision mechanism possible • transparent to software (thus safe) • delay, clock rate limit algorithm complexity • difficult to maintain long-term trends • little knowledge of about program behavior

  13. Where to Implement - Cont’d • Hardware/software • Software can set coarse hardware parameters • Hardware can supply software dynamic info • Perhaps more complex algorithms can be used • Software modification required • Communication mechanism required

  14. Current Investigation • L1 cache assist • See wide variability in assist mechanisms effectiveness between • Individual Programs • Within a program as a function of time • Propose hardware mechanisms to select between assist types and allocate buffer space • Give compiler an opportunity to set parameters

  15. Mechanisms Used • Prefetching • Stream Buffers • Stride-directed, based on address alone • Miss Stride: prefetch the same address using the number of intervening misses • Victim Cache • Write Buffer, all after L1

  16. Mechanisms Used - Cont’d • A mechanism can be used by itself or • All are used at once • Buffer space size and organization fixed • No adaptivity involved

  17. Observed Behavior • Programs exhibit different effect from each mechanism, e.g none a consistent winner • Within a program the same holds in the time domain between mechanisms.

  18. Observed Behavior - Cont’d • Both of the above facts indicate a likely improvement from adaptivity • Select a better one among mechanisms • Even more can be expected from adaptively re-allocating from the combined buffer pool • To reduce stall time • To reduce the number of misses

  19. Proposed Adaptive Mechanism • Hardware: • a common pool of 2-4 word buffers • a set of possible policies, a subset of: • Stride-directed prefetch • PC-based prefetch • History-based prefetch • Victim cache • Write buffer

  20. Adaptive Hardware - Cont’d • Performance monitors for each type/buffer • misses, stall time on hit, thresholds • Dynamic buffer allocator among mechanisms • Allocation and monitoring policy: • Predict future behavior from observed past • Observe over a time interval dT, set for next • Save perform. trends in next-level tags (<8bits)

  21. Further opportunities to adapt • L2 cache organization • variable-size line • L2 non-sequential prefetch • In-memory assists (DRAM)

  22. MP Opportunities • Even longer latency • Coherence, hardware or software • Synchronization • Prefetch under and beyond the above • Avoid coherence if possible • Prefetch past synchronization • Assist Adaptive Scheduling

More Related