la lru a latency aware replacement policy for variation tolerant caches
Download
Skip this Video
Download Presentation
LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches

Loading in 2 Seconds...

play fullscreen
1 / 18

LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches - PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on

LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches. Aarul Jain, Cambridge Silicon Radio, Phoenix . Aviral Shrivastava, Arizona State University, Tempe . Chaitali Chakrabarti , Arizona State University, Tempe . Introduction: Variations.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches' - yaakov


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
la lru a latency aware replacement policy for variation tolerant caches

LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches

Aarul Jain, Cambridge Silicon Radio, Phoenix

Aviral Shrivastava, Arizona State University, Tempe

ChaitaliChakrabarti, Arizona State University, Tempe

introduction variations
Introduction: Variations
  • Process Variation: Due to loss of control in manufacturing process. Variation in channel length, oxide thickness and doping concentration.
    • Systematic Variation: wafer to wafer variation
    • Random Variation: within-die variation
  • Voltage Variation: Voltage within a chip varies.
    • IR drop ~ 3%.
    • LDO tolerance ~ 5%.
  • Temperature Variation.
introduction variations1
Introduction: Variations
  • To design reliable circuits, we use worst case corner as a sign-off criteria.
  • The worst case corner is a compromise between yield and performance.
  • The delay variation in sub 100nm impacts maximum operable frequency significantly.
introduction techniques to compensate for variation
Introduction: Techniques to Compensate for Variation
  • System level techniques for compensating variation at expense of performance. [Roy, VLSI Design ‘09]
    • CRISTA: Critical Path Isolation for Timing Adaptiveness
    • Variation Tolerance by Trading Off Quality of Results.
  • Memories suffer from more variation than logic due to small transistor size and caches often decide the maximum operable frequency of a system.
  • Our paper present system-level techniques to improve performance of a variation-tolerant cache by reducing usage of cache blocks having large delay variation.
    • LA-LRU: Latency Aware LRU policy.
    • Block Rearrangement.
agenda
Agenda
  • Adaptive Cache Architecture [Ben Naser, TVLSI ‘08]
  • LA-LRU
  • Block Rearrangement
  • Simulation Results
  • Summary
adaptive cache architecture 1 2
Adaptive Cache Architecture (1/2)
  • [Ben Naser, TVLSI ‘08]: Monte Carlo simulations using 32nm PTM models showed that access to 25% of cache blocks requires two cycle access and occasionally even three cycle accesses to account for delay variability.
la lru 1 6
LA-LRU (1/6)
  • Adaptive Cache architecture with LRU policy
  • The delay storage provides information on latency of each cache block which can be used to modify the replacement policy to increase single cycle access.
la lru 2 6
LA-LRU (2/6)
  • Conventional Least Recently Used Replacement Policy (LRU)
  • MRU data can be in high latency ways.

LRU mechanism in conventional Cache

(000 -> MRU data, 111->LRU data)

la lru 3 6
LA-LRU (3/6)
  • Latency-Aware Least Recently Used Replacement Policy (LA-LRU)
  • The LRU data is always in high latency ways.

LA-LRU mechanism

(000 -> MRU data, 111->LRU data)

la lru 4 6
LA-LRU (4/6)
  • Cache Access Distribution
  • Significant increase in one cycle accesses.
la lru 5 6
LA-LRU (5/6)
  • Architecture Block Diagram: 64KB/8/32
la lru 6 6
LA-LRU (6/6)
  • Latency overhead with LA-LRU
    • Hit in ways with latency 1 = 1 cycle
    • Hit in ways with latency 2 = 2-4 cycles
    • Hit in ways with latency 3 = 3-6 cycles
    • Miss in cache = cache miss penalty
  • Assume tag array is unaffected by process variation and exchanges in it can be made within one cycle.
  • Synthesis using Synopsys shows that power overhead is only 3.5% of the total LRU logic.
block rearrangement 1 1
Block Rearrangement (1/1)
  • High Latency blocks randomly distributed.
  • Modify Address decoder to have uniform distribution of high latency ways among sets.
  • Overhead of Perfect BRT: (log2(number of sets)) mux stages in decoder.

No block

rearrangement

Perfect block

rearrangement

Paired block

rearrangement

simulation results 1 3
Simulation Results (1/3)
  • Comparing following architectures:
    • NPV : No process variation.
    • WORST: Each access takes three cycles.
    • ADAPT: Cache access depends on the latency of block.
    • LA-LRU: The proposed replacement policy.
    • LA-LRU with PairedBRT: LA-LRU with block re-arrangement within two adjacent sets.
    • LA-LRU with PerfectBRT: LA-LRU with block re-arrangement amongst all sets.
simulation results 2 3
Simulation Results (2/3)
  • Simulation Environment
    • Modified WattchSimplescalar to measure performance for Xscale, PowerPC and Alpha21265 like processor configurations using SPEC2000 benchmarks.
    • Generate random distribution of latency for following two variation models:
      • 15% two cycle and 0% three cycle latency blocks.
      • 25% two cycle and 1% three cycle latency blocks.
simulation results 3 3
Simulation Results (3/3)
  • Average Degradation in Memory Access Latency
  • LA-LRU is sufficient for Xscale and PowerPC.
  • LA-LRU + PerfectBRT is better than LA-LRU for Alpha21265.
summary 1 1
Summary (1/1)
  • LA-LRU combined with adaptive techniques to vary cache access latency, improves performance of cache affected by variation significantly.
  • LA-LRU reduces memory access latency degradation due to variation to almost 0 for almost any cache configuration.
  • For low-associative caches, block rearrangement with LA-LRU may be used to further reduce any performance degradation.
  • The power overhead of implementing the LA-LRU scheme is negligible because LA-LRU logic is excited less than 1% of time and is only 3.5% of the power consumption of LRU logic.
  • Observed similar results for policies such as FIFO etc.
ad