1 / 28

Learning Cache Models by Measurements

Learning Cache Models by Measurements. Jan Reineke j oint work with Andreas Abel Uppsala University December 20, 2012 . The Timing Analysis Problem. +. Embedded Software. Microarchitecture. ?. Timing Requirements. What does the execution time of a program depend on?. Input-dependent

jabari
Download Presentation

Learning Cache Models by Measurements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Cache Models by Measurements Jan Reineke joint work with Andreas Abel Uppsala University December 20, 2012

  2. The Timing Analysis Problem + Embedded Software Microarchitecture ? Timing Requirements

  3. What does the execution time of a program depend on? Input-dependent control flow Microarchitectural State + Pipeline, Memory Hierarchy, Interconnect

  4. Example of Influence of Microarchitectural State Motorola PowerPC 755

  5. Typical Structure of Timing Analysis Tools Microarchitecture-specific part ISA-specific analysis parts

  6. Construction of Timing Models Needs to accurately model all aspects of the microarchitecture that influence execution time.

  7. Construction of Timing Models Today

  8. Construction of Timing Models Tomorrow

  9. Cache Models • Cache Model is important part of Timing Model • Caches have simple interface: loads+stores • Can be characterized by a few parameters: • ABC: associativity, block size, capacity • Replacement policy

  10. High-level Approach • Generate memory access sequences • Determine number of cache misses on these sequences • using performance counters, or • by measuring execution times. • Infer property from measurement results.

  11. Warm-up I: Inferring the Cache Capacity Basic idea: • Access sets of memory blocks A once. • Then access A again and count cache misses. • If A fits into the cache, no misses will occur. Notation: Preparatory accesses Accesses to measure Measure with .

  12. Example: Intel Core 2 Duo E6750, L1 Data Cache Way Size = 4 KB |Misses| |Size| Capacity = 32 KB

  13. Warm-up II: Inferring the Block Size Given: way size W and associativity A Wanted: block size B

  14. Inferring the Replacement Policy There are infinitely many conceivable replacement policies… • For any set of observations, multiple policies remain possible. • Need to make some assumption on possible policies to render inference possible.

  15. A Class of Replacement Policies: Permutation Policies • Permutation Policies: • Maintain total order on blocks in each cache set • Evict the greatest block in the order • Update the order based on the position of the accessed block in the order • Can be specified by A+1 permutations Associativity-many “hit” permutations one “miss” permutation • Examples: LRU, FIFO, PLRU, …

  16. Permutation Policy LRU LRU (leastrecently used): order on blocks based on recency of last access c e most-recently-used least-recently-used “hit” permutation 3 “miss” permutation

  17. Permutation Policy FIFO FIFO (first-in first-out): order on blocks based on order of cache misses c e last-in first-in “hit” permutation 3 “miss” permutation

  18. Inferring Permutation Policies Strategy to infer permutation i: 1) Establish a known cache state s 2) Trigger permutation i 3) “Read out” the resulting cache state s’. Deduce permutation from s and s’.

  19. 1) Establishing a known cache state Assume “miss” permutation of FIFO, LRU, PLRU: d c b a

  20. 2) Triggering permutation i Simply access ith block in cache state. E,g, for i = 3, access c: c ?

  21. 3) Reading out a cache state Exploit “miss” permutation to determine position of each of the original blocks: If position is j then A-j+1 misses will evict the block, but A-j misses will not. After 1 miss, c is still cached. After 2 misses, c is not cached

  22. Inferring Permutation Policies:Example of LRU, Permutation 3 2) Trigger permutation 3 1) Establish known state c 3) Read out resulting state most-recently-used least-recently-used

  23. Implementation Challenges • Interference • Prefetching • Instruction Caches • Virtual Memory • L2+L3 Caches: • Strictly-inclusive • Non-inclusive • Exclusive • Shared Caches: • Coherence

  24. Experimental Results Undocumented variant of LRU Surprising measurements ;-) Nehalem introduced “L0” micro-op buffer Exclusive hierarchy

  25. L2 Caches on some Core 2 Duos Core 2 Duo E6300, 2 MB, 8 ways |misses| Number of cache sets, 4096 Core 2 Duo E6750, 4 MB, 16 ways Core 2 Duo E8400, 6 MB, 24 ways n associativities

  26. Replacement Policy of the Intel Atom D525 Discovered to our knowledge undocumented “hierarchical” policy: d x ATOM = LRU(3, LRU(2)) PLRU(2k) = LRU(2, PLRU(k))

  27. Conclusions and Future Work • Inference of cache models I • More general class of replacement policies, e.g. by inferring canonical register automata. • Shared caches in multicores, coherency protocols, etc. • Deal with other architectural features: translation lookaside buffers, branch predictors, prefetchers • Infer abstractions instead of “concrete” models

  28. Questions? Measurement-based Modeling of the Cache Replacement Policy
A. Abel and J. Reineke
RTAS, 2013 (to appear).

More Related