1 / 21

Using Partial Tag Comparison in Low-Power Snoop-based Chip Multiprocessors

Using Partial Tag Comparison in Low-Power Snoop-based Chip Multiprocessors. Ali Shafiee Narges Shahidi Amirali Baniasadi Sharif University of Technology University of Victoria. This Work: Improving Snoop Coherency. Goal: Improving energy efficiency in snoop-based CMPs.

aleta
Download Presentation

Using Partial Tag Comparison in Low-Power Snoop-based Chip Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Partial Tag Comparison in Low-Power Snoop-based Chip Multiprocessors Ali Shafiee Narges Shahidi Amirali Baniasadi Sharif University of Technology University of Victoria

  2. This Work: Improving Snoop Coherency Goal: Improving energy efficiency in snoop-based CMPs. Motivation: Broadcasting/processing entire tag is inefficient. Our Solution: Using Partial Tag Comparison (PTC) prior to snoop. Key Results Performance (2.9%) Tag array power (52%) Bandwidth utilization (78.5%)

  3. Our Solution (PTC) vs. Conventional Conventional Our solution D$ D$ D$ …. D$ D$ …. D$ Interconnect Interconnect Upper Level Cache Upper Level Cache Fast ++ (early miss detection) Power & Bandwidth Efficient + Fast + Power & Bandwidth −

  4. Conventional Snooping CPU CPU 3 D$ D$ 4 1 Redundant (miss): ~70% Address Bus 2 Snoop Bus controller Command Bus 5 4 4 D$ D$ 3 3 CPU CPU

  5. Snoop Filters Goal: Eliminate redundant snoop requests. Example: RegionScout (ISCA’05), CGCT(ISCA’05), SSP (ASPLOS’08) PTC: (1) Early miss detection using subset of tag bits. (2) Once a miss is detected, snoop is avoided. How often is that possible?

  6. How often using n bits is enough to detect a miss? 95+% of misses can be detected using 8 bits.

  7. PTC-Filter D$ PTC-Filter LSB LSB LSB hit miss Avoid Snoop Access Upper Level Snoop Potential Targets Address Bus

  8. PTC-Filter 1 2 0 3 4-way D$ 4-way D$ 4-way D$ 4-way D$ PTC-Filter Filter Filter Filter … LSB D V 8 bits Core1’s LSB Core2’s LSB Core3’s LSB

  9. PTC: Filter Miss CPU CPU D$ D$ 1 2 Address Bus 3 Snoop Bus controller Command Bus D$ D$ CPU CPU

  10. PTC: Filter Hit CPU CPU D$ 4 D$ ✓ ✗ 1 ✗ ✓ 5 2 Address Bus 3 Snoop Bus controller Command Bus 6 ✗ ✗ D$ D$ CPU CPU

  11. Filter Maintenance Core 0 Core i CPU Snoop Controller Request =A 1 Pending Request Table ….. ….. 6 PTC- Filter 2 4 A 0 1 1 miss A. place it in position of tag F 6 5 Place A, insert in Way 1 of core 0 Command Bus 3 Address Bus {Address=A, C=0,W=1, D=1}

  12. Methodology • SESC simulator 4-way CMP • SPLASH-2 benchmarks • CACTI 6.0

  13. Performance Average: 2.9%

  14. Bandwidth Average: 78.5%

  15. Tag Power Average: 52%

  16. Discussion • Why do benchmarks show different performance improvement? • Different cache miss frequency • Different early miss detection frequency • Not all cache misses are on the critical path • Filter overhead: • Timing: 1 cycle • Power: 78.5% of single tag array access

  17. Summary • PTC: • Using subset of tag bits to improve bandwidth/power efficiency. • Results: • Performance: 2.9% • Tag Power: 52% • Bandwidth: 78.5%

  18. Global vs. Local Miss Have B? Have B? NO NO NO YES NO • local miss detection  better power/bandwidth profile • Remote miss detection (source-based approach) vs. (destination-based filter) D$ D$ D$ D$ D$ D$ D$ …. …. Interconnect interconnect Upper Level Cache Upper Level Cache Global Miss Local Miss

  19. Partial tag lookup: global miss

  20. Partial tag lookup: local miss

More Related