Exploring Improved Cache Organizations Based on Page-Level Contention Behavior

Exploring Improved Cache OrganizationsBased on Page-Level Contention Behavior Sriram Vajapeyam Independent Consultant June 2002

A New Angle on Improving Caches • Contention for a Cache Block: • Is there a pattern at the Virtual Page level? • Yes! At least for SPECInt2000: • A few pages/page groups contend repeatedly • Contending Virtual Addresses differ in just a few bits • How do we exploit this? (c) S. Vajapeyam

Exploiting Page-Level Contention Behavior • Better Choice of Index bits • Reducing Tag Overheads: • New Cache Organization: Sub-Tagged Caches • Set-Associative Caches • Decoupled Sector Caches + Cache-Conscious Virtual Address space allocation (c) S. Vajapeyam

Talk Outline • Motivation • Previous Approaches Page-Level Contention Behavior • Tag-bit Contention Patterns • Further Reducing # of Contending Tag-bits Exploiting Tag-bit Contention Patterns • Better Cache Indexing • Sub-tagged DSC and Set-Associative Caches • Sub-tagged Caches (Direct-Mapped) (c) S. Vajapeyam

Motivation: Further Cache Work • Processor-Memory Speed Disparity • Wire Delay • smaller caches important for access speed • Very Deep Processor Pipelines • smaller caches important for access speed • Low Power • tag overheads (especially 64-bit addr) • smaller or banked caches We look at today’s L1 caches (c) S. Vajapeyam

Previous Cache Approaches Rich body of work exploits program behavior: • Locality: • Temporal, Spatial; Structural [Hsu & Sohi] • ConflictPatterns: • Physical Page Coloring [Bershad ‘94] • Cache Indexing [Gonzalez ‘97, Agarwal’92, etc etc] • Page-to-Bank Allocation [Vijay ‘01] • Data Access Timing: • Cachelets [Shen ‘01] (c) S. Vajapeyam

A Different Approach: Tag Contention Patterns • Differences between Tag Bits of replacement misses: Repetitive, Limited • e.g. only tag bits 28 and 16 different between replaced and fetched block • => only tag bits 28, 16 conflict or contend • this happens repeatedly Replacement Miss: • Replaces a live block (valid & to be used again) • Caused by a contending access (any of the 3 Cs) (c) S. Vajapeyam

Study Framework • IBM RS6000, 32-bit Virtual Addresses • IBM xlc compiler, -O3 optimization • SPEC Integer 2000 • Data Refs of 200M insts after discarding startup phase • (Validated against some 2B traces) • L1 Caches, Direct-Mapped, Virtual Address • 8K, 16K (32B block); 32K, 64K (64B block) (c) S. Vajapeyam

Caveats • SPEC • SPECInt • Dynamically-linked Code – can be different • Just one compiler-machine platform (IBM) • we know there are differences with Alpha, SUN (c) S. Vajapeyam

Contention Participation of Individual Tag Bits • Different Tag Bits contribute differently to contention • not just LSBs of tag bits are important • some hardly ever contribute • Some bits stand out: e.g. stack/heap bit Can “compress” tag representation (c) S. Vajapeyam

(c) S. Vajapeyam

Cumulative Contention Participation of Tag Bits • 6 tag bits account for a large majority of replacements • > 90% for 6 benchmarks [16KB DM Cache] • > 80% for 3 benchmarks • LSB tag bits: 5 bits account for > 80% in 5 benchmarks => Important to consider MSB tag bits also, not just LSB e.g. in XOR indexing schemes (c) S. Vajapeyam

Groups of Contending Tag-bits • Several tag-bits contend together, not individually • e.g. bits 15,16,17,18 all differ simultaneously => particular pages/page-groups contend repeatedly • Examples: gzip: 17 perlbmk: 14,18-19,21-22,24-28 vpr: 14-16 vortex: 14-16 eon: 14,18-21,24-28 gap: 17-18,20,22,24-28 crafty:17,18,21,23-28 parser: 15-17,20-21,23-28 (c) S. Vajapeyam

Contribution of Top-10 Tag-bit Groups Benchmark 16KB Cache64KB Cache Eon 83.68 % 99.47 % Gzip 58.63 % 98.20 % Perlbmk 59.69 % 94.29 % Vpr 60.84 % 93.77 % Gap 77.01 % 91.87 % Crafty 49.24 % 78.82 % Twolf 23.18 % 74.61 % (c) S. Vajapeyam

Summary of Page-Level Behavior • Different tag-bits participate differently in contentions • 5-6 tag-bits cumulatively account for a large majority • Groups of tag-bits participate together in contention • => particular pages/page-groups contend frequently - How do we exploit this? - Can we further reduce # of contending tag bits? (c) S. Vajapeyam

Further Reducing # Contending Tag Bits • Cache-Conscious Compiler: suitable VA Space allocation • e.g. IBM stack-heap contention • relocate base of stack or heap • similarly for program data-structures • profile-driven? (c) S. Vajapeyam

32-bit VA Further Reducing # Contending Tag Bits • Relocation can be • within the VA • to an extended VA e.g. add 2 bits to VA only at L1 cache: V-V TLB VA Relocated-VA Filled by Snoop Logic or Dynamic Optimizer/Compiler (c) S. Vajapeyam

Exploiting Tag Contention Patterns • Improve Cache Performance (Hit Rate) • Reduce Tag Overheads (c) S. Vajapeyam

Better Cache Indexing • Use frequently conflicting tag bits in index instead • e.g. bit 17 instead of bits 14, 15 for gzip, 64KB cache • bit 28 (stack/heap bit) for some benchmarks • Different from XOR indexing: • XOR scatters refs across entire cache • tag bit can be used to choose a cache bank instead • Dynamic Index Selection (at least some bits) possible (c) S. Vajapeyam

Sub-Tagged Caches: Saving Tag Bits Cache Block: tag = main tag + sub-tag individual sub-tags main tag sub-blocks (c) S. Vajapeyam

Sub-Tagged Caches: Variable-Size Blocks! sub-tags + sub-block prefetch=? Variable-Size Block main tag emulated block (c) S. Vajapeyam

Set-Associative Sub-Tagged Caches Common Main Tag for Set: Sub-tags Main tag Set1 Set2 Set3 (c) S. Vajapeyam

Sub-Tagged DSC (Decoupled Sector Caches) Share Main Tag across Sector: sub-tags Sector main tag (c) S. Vajapeyam

Acknowledgements • Portions are Joint Work with: • Siddhartha Tambat (TCCA C.A.Letters, July 2002) • S. Muthulaxmi (M.S. Thesis, “A Study of Variable Block Size Caches”, Indian Institute of Science, Oct.1997) • Intel MRL Equipment Grant (c) S. Vajapeyam

Summary • “Replacement Misses” exhibit patterns in tag-bit conflicts: • A few tag bits dominant • Group participation of tag-bits (i.e. pages/page-groups) • Compiler/Hardware an enhance tag conflict patterns • Possible Applications: • Better Cache Indexing • Sub-tagged Caches (c) S. Vajapeyam

Exploring Improved Cache Organizations Based on Page-Level Contention Behavior