1 / 29

Exploring Improved Cache Organizations Based on Page-Level Contention Behavior

Exploring Improved Cache Organizations Based on Page-Level Contention Behavior. Sriram Vajapeyam Independent Consultant June 2002. A New Angle on Improving Caches. Contention for a Cache Block: Is there a pattern at the Virtual Page level? Yes! At least for SPEC Int 2000:

eabbott
Download Presentation

Exploring Improved Cache Organizations Based on Page-Level Contention Behavior

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring Improved Cache OrganizationsBased on Page-Level Contention Behavior Sriram Vajapeyam Independent Consultant June 2002

  2. A New Angle on Improving Caches • Contention for a Cache Block: • Is there a pattern at the Virtual Page level? • Yes! At least for SPECInt2000: • A few pages/page groups contend repeatedly • Contending Virtual Addresses differ in just a few bits • How do we exploit this? (c) S. Vajapeyam

  3. Exploiting Page-Level Contention Behavior • Better Choice of Index bits • Reducing Tag Overheads: • New Cache Organization: Sub-Tagged Caches • Set-Associative Caches • Decoupled Sector Caches + Cache-Conscious Virtual Address space allocation (c) S. Vajapeyam

  4. Talk Outline • Motivation • Previous Approaches Page-Level Contention Behavior • Tag-bit Contention Patterns • Further Reducing # of Contending Tag-bits Exploiting Tag-bit Contention Patterns • Better Cache Indexing • Sub-tagged DSC and Set-Associative Caches • Sub-tagged Caches (Direct-Mapped) (c) S. Vajapeyam

  5. Motivation: Further Cache Work • Processor-Memory Speed Disparity • Wire Delay • smaller caches important for access speed • Very Deep Processor Pipelines • smaller caches important for access speed • Low Power • tag overheads (especially 64-bit addr) • smaller or banked caches We look at today’s L1 caches (c) S. Vajapeyam

  6. Previous Cache Approaches Rich body of work exploits program behavior: • Locality: • Temporal, Spatial; Structural [Hsu & Sohi] • ConflictPatterns: • Physical Page Coloring [Bershad ‘94] • Cache Indexing [Gonzalez ‘97, Agarwal’92, etc etc] • Page-to-Bank Allocation [Vijay ‘01] • Data Access Timing: • Cachelets [Shen ‘01] (c) S. Vajapeyam

  7. A Different Approach: Tag Contention Patterns • Differences between Tag Bits of replacement misses: Repetitive, Limited • e.g. only tag bits 28 and 16 different between replaced and fetched block • => only tag bits 28, 16 conflict or contend • this happens repeatedly Replacement Miss: • Replaces a live block (valid & to be used again) • Caused by a contending access (any of the 3 Cs) (c) S. Vajapeyam

  8. Study Framework • IBM RS6000, 32-bit Virtual Addresses • IBM xlc compiler, -O3 optimization • SPEC Integer 2000 • Data Refs of 200M insts after discarding startup phase • (Validated against some 2B traces) • L1 Caches, Direct-Mapped, Virtual Address • 8K, 16K (32B block); 32K, 64K (64B block) (c) S. Vajapeyam

  9. Caveats • SPEC • SPECInt • Dynamically-linked Code – can be different • Just one compiler-machine platform (IBM) • we know there are differences with Alpha, SUN (c) S. Vajapeyam

  10. Contention Participation of Individual Tag Bits • Different Tag Bits contribute differently to contention • not just LSBs of tag bits are important • some hardly ever contribute • Some bits stand out: e.g. stack/heap bit Can “compress” tag representation (c) S. Vajapeyam

  11. (c) S. Vajapeyam

  12. (c) S. Vajapeyam

  13. Cumulative Contention Participation of Tag Bits • 6 tag bits account for a large majority of replacements • > 90% for 6 benchmarks [16KB DM Cache] • > 80% for 3 benchmarks • LSB tag bits: 5 bits account for > 80% in 5 benchmarks => Important to consider MSB tag bits also, not just LSB e.g. in XOR indexing schemes (c) S. Vajapeyam

  14. (c) S. Vajapeyam

  15. Groups of Contending Tag-bits • Several tag-bits contend together, not individually • e.g. bits 15,16,17,18 all differ simultaneously => particular pages/page-groups contend repeatedly • Examples: gzip: 17 perlbmk: 14,18-19,21-22,24-28 vpr: 14-16 vortex: 14-16 eon: 14,18-21,24-28 gap: 17-18,20,22,24-28 crafty:17,18,21,23-28 parser: 15-17,20-21,23-28 (c) S. Vajapeyam

  16. Contribution of Top-10 Tag-bit Groups Benchmark 16KB Cache64KB Cache Eon 83.68 % 99.47 % Gzip 58.63 % 98.20 % Perlbmk 59.69 % 94.29 % Vpr 60.84 % 93.77 % Gap 77.01 % 91.87 % Crafty 49.24 % 78.82 % Twolf 23.18 % 74.61 % (c) S. Vajapeyam

  17. Summary of Page-Level Behavior • Different tag-bits participate differently in contentions • 5-6 tag-bits cumulatively account for a large majority • Groups of tag-bits participate together in contention • => particular pages/page-groups contend frequently - How do we exploit this? - Can we further reduce # of contending tag bits? (c) S. Vajapeyam

  18. Further Reducing # Contending Tag Bits • Cache-Conscious Compiler: suitable VA Space allocation • e.g. IBM stack-heap contention • relocate base of stack or heap • similarly for program data-structures • profile-driven? (c) S. Vajapeyam

  19. 32-bit VA Further Reducing # Contending Tag Bits • Relocation can be • within the VA • to an extended VA e.g. add 2 bits to VA only at L1 cache: V-V TLB VA Relocated-VA Filled by Snoop Logic or Dynamic Optimizer/Compiler (c) S. Vajapeyam

  20. Exploiting Tag Contention Patterns • Improve Cache Performance (Hit Rate) • Reduce Tag Overheads (c) S. Vajapeyam

  21. Better Cache Indexing • Use frequently conflicting tag bits in index instead • e.g. bit 17 instead of bits 14, 15 for gzip, 64KB cache • bit 28 (stack/heap bit) for some benchmarks • Different from XOR indexing: • XOR scatters refs across entire cache • tag bit can be used to choose a cache bank instead • Dynamic Index Selection (at least some bits) possible (c) S. Vajapeyam

  22. Sub-Tagged Caches: Saving Tag Bits Cache Block: tag = main tag + sub-tag individual sub-tags main tag sub-blocks (c) S. Vajapeyam

  23. Sub-Tagged Caches: Variable-Size Blocks! sub-tags + sub-block prefetch=? Variable-Size Block main tag emulated block (c) S. Vajapeyam

  24. Set-Associative Sub-Tagged Caches Common Main Tag for Set: Sub-tags Main tag Set1 Set2 Set3 (c) S. Vajapeyam

  25. Sub-Tagged DSC (Decoupled Sector Caches) Share Main Tag across Sector: sub-tags Sector main tag (c) S. Vajapeyam

  26. Acknowledgements • Portions are Joint Work with: • Siddhartha Tambat (TCCA C.A.Letters, July 2002) • S. Muthulaxmi (M.S. Thesis, “A Study of Variable Block Size Caches”, Indian Institute of Science, Oct.1997) • Intel MRL Equipment Grant (c) S. Vajapeyam

  27. Summary • “Replacement Misses” exhibit patterns in tag-bit conflicts: • A few tag bits dominant • Group participation of tag-bits (i.e. pages/page-groups) • Compiler/Hardware an enhance tag conflict patterns • Possible Applications: • Better Cache Indexing • Sub-tagged Caches (c) S. Vajapeyam

  28. (c) S. Vajapeyam

  29. (c) S. Vajapeyam

More Related