adaptive cache compression for high performance processors n.
Download
Skip this Video
Download Presentation
Adaptive Cache Compression for High-Performance Processors

Loading in 2 Seconds...

play fullscreen
1 / 49

Adaptive Cache Compression for High-Performance Processors - PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on

Adaptive Cache Compression for High-Performance Processors. Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project http://www.cs.wisc.edu/multifacet. Overview. Design of high performance processors Processor speed improves faster than memory

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Adaptive Cache Compression for High-Performance Processors' - russ


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
adaptive cache compression for high performance processors

Adaptive Cache Compression for High-Performance Processors

Alaa Alameldeen and David Wood

University of Wisconsin-Madison

Wisconsin Multifacet Project

http://www.cs.wisc.edu/multifacet

overview
Overview
  • Design of high performance processors
    • Processor speed improves faster than memory
  • Memory latency dominates performance
    • Need more effective cache designs
  • On-chip cache compression
    • Increases effective cache size
    • Increases cache hit latency
  • Does cache compression help or hurt?

Alaa Alameldeen – Adaptive Cache Compression

does cache compression help or hurt
Does Cache Compression Help or Hurt?

Alaa Alameldeen – Adaptive Cache Compression

does cache compression help or hurt1
Does Cache Compression Help or Hurt?

Alaa Alameldeen – Adaptive Cache Compression

does cache compression help or hurt2
Does Cache Compression Help or Hurt?

Alaa Alameldeen – Adaptive Cache Compression

does cache compression help or hurt3
Does Cache Compression Help or Hurt?
  • Adaptive Compression determines when compression is beneficial

Alaa Alameldeen – Adaptive Cache Compression

outline
Outline
  • Motivation
  • Cache Compression Framework
    • Compressed Cache Hierarchy
    • Decoupled Variable-Segment Cache
  • Adaptive Compression
  • Evaluation
  • Conclusions

Alaa Alameldeen – Adaptive Cache Compression

compressed cache hierarchy

Instruction

Fetcher

Load-Store

Queue

L1 I-Cache

(Uncompressed)

L1 D-Cache

(Uncompressed)

Uncompressed

Line

Bypass

Decompression

Pipeline

L1 Victim Cache

Compression

Pipeline

From Memory

To Memory

L2 Cache (Compressed)

Compressed Cache Hierarchy

Alaa Alameldeen – Adaptive Cache Compression

decoupled variable segment cache
Decoupled Variable-Segment Cache
  • Objective: pack more lines into the same space

Tag Area

Data Area

Address A

Address B

  • 2-way set-associative with 64-byte lines
  • Tag Contains Address Tag, Permissions, LRU (Replacement) Bits

Alaa Alameldeen – Adaptive Cache Compression

decoupled variable segment cache1
Decoupled Variable-Segment Cache
  • Objective: pack more lines into the same space

Tag Area

Data Area

Address A

Address B

Address C

Address D

Add two more tags

Alaa Alameldeen – Adaptive Cache Compression

decoupled variable segment cache2
Decoupled Variable-Segment Cache
  • Objective: pack more lines into the same space

Tag Area

Data Area

Address A

Address B

Address C

Address D

Add Compression Size, Status, More LRU bits

Alaa Alameldeen – Adaptive Cache Compression

decoupled variable segment cache3
Decoupled Variable-Segment Cache
  • Objective: pack more lines into the same space

Tag Area

Data Area

Address A

Address B

Address C

Address D

Divide Data Area into 8-byte segments

Alaa Alameldeen – Adaptive Cache Compression

decoupled variable segment cache4
Decoupled Variable-Segment Cache
  • Objective: pack more lines into the same space

Tag Area

Data Area

Address A

Address B

Address C

Address D

Data lines composed of 1-8 segments

Alaa Alameldeen – Adaptive Cache Compression

decoupled variable segment cache5
Decoupled Variable-Segment Cache
  • Objective: pack more lines into the same space

Tag Area

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

Tag is present but line isn’t

Compression Status

Compressed Size

Alaa Alameldeen – Adaptive Cache Compression

outline1
Outline
  • Motivation
  • Cache Compression Framework
  • Adaptive Compression
    • Key Insight
    • Classification of L2 accesses
    • Global compression predictor
  • Evaluation
  • Conclusions

Alaa Alameldeen – Adaptive Cache Compression

adaptive compression

Benefit(Compression)

> Cost(Compression)

No

Yes

Do not compress

future lines

Compress

future lines

Adaptive Compression
  • Use past to predict future
  • Key Insight:
    • LRU Stack [Mattson, et al., 1970] indicates for each reference whether compression helps or hurts

Alaa Alameldeen – Adaptive Cache Compression

cost benefit classification
Cost/Benefit Classification

LRU Stack

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

  • Classify each cache reference
  • Four-way SA cache with space for two 64-byte lines
    • Total of 16 available segments

Alaa Alameldeen – Adaptive Cache Compression

an unpenalized hit
An Unpenalized Hit

LRU Stack

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

  • Read/Write Address A
    • LRU Stack order = 1 ≤ 2  Hit regardless of compression
    • Uncompressed Line  No decompression penalty
    • Neither cost nor benefit

Alaa Alameldeen – Adaptive Cache Compression

a penalized hit
A Penalized Hit

LRU Stack

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

  • Read/Write Address B
    • LRU Stack order = 2 ≤ 2  Hit regardless of compression
    • Compressed Line  Decompression penalty incurred
    • Compression cost

Alaa Alameldeen – Adaptive Cache Compression

an avoided miss
An Avoided Miss

LRU Stack

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

  • Read/Write Address C
    • LRU Stack order = 3 > 2  Hit only because of compression
    • Compression benefit: Eliminated off-chip miss

Alaa Alameldeen – Adaptive Cache Compression

an avoidable miss
An Avoidable Miss

LRU Stack

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

  • Read/Write Address D
    • Line is not in the cache but tag exists at LRU stack order = 4
    • Missed only because some lines are not compressed
    • Potential compression benefit

Sum(CSize) = 15 ≤ 16

Alaa Alameldeen – Adaptive Cache Compression

an unavoidable miss
An Unavoidable Miss

LRU Stack

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

  • Read/Write Address E
    • LRU stack order > 4  Compression wouldn’t have helped
    • Line is not in the cache and tag does not exist
    • Neither cost nor benefit

Alaa Alameldeen – Adaptive Cache Compression

compression predictor
Compression Predictor
  • Estimate: Benefit(Compression) – Cost(Compression)
  • Single counter : Global Compression Predictor (GCP)
    • Saturating up/down 19-bit counter
  • GCP updated on each cache access
    • Benefit: Increment by memory latency
    • Cost: Decrement by decompression latency
    • Optimization: Normalize to decompression latency = 1
  • Cache Allocation
    • Allocate compressed line if GCP  0
    • Allocate uncompressed lines if GCP < 0

Alaa Alameldeen – Adaptive Cache Compression

outline2
Outline
  • Motivation
  • Cache Compression Framework
  • Adaptive Compression
  • Evaluation
    • Simulation Setup
    • Performance
  • Conclusions

Alaa Alameldeen – Adaptive Cache Compression

simulation setup
Simulation Setup
  • Simics full system simulator augmented with:
    • Detailed OoO processor simulator [TFSim, Mauer, et al., 2002]
    • Detailed memory timing simulator [Martin, et al., 2002]
  • Workloads:
    • Commercial workloads:
      • Database servers: OLTP and SPECJBB
      • Static Web serving: Apache and Zeus
    • SPEC2000 benchmarks:
      • SPECint: bzip, gcc, mcf, twolf
      • SPECfp: ammp, applu, equake, swim

Alaa Alameldeen – Adaptive Cache Compression

system configuration
System configuration
  • A dynamically scheduled SPARC V9 uniprocessor
  • Configuration parameters:

Alaa Alameldeen – Adaptive Cache Compression

simulated cache configurations
Simulated Cache Configurations
  • Always: All compressible lines are stored in compressed format
    • Decompression penalty for all compressed lines
  • Never: All cache lines are stored in uncompressed format
    • Cache is 8-way set associative with half the number of sets
    • Does not incur decompression penalty
  • Adaptive: Our adaptive compression scheme

Alaa Alameldeen – Adaptive Cache Compression

performance
Performance

SpecINT

SpecFP

Commercial

Alaa Alameldeen – Adaptive Cache Compression

performance1
Performance

Alaa Alameldeen – Adaptive Cache Compression

performance2
Performance

35% Speedup

18% Slowdown

Alaa Alameldeen – Adaptive Cache Compression

performance3
Performance

Bug in GCP update

Adaptive performs similar to the best of Always and Never

Alaa Alameldeen – Adaptive Cache Compression

effective cache capacity
Effective Cache Capacity

Alaa Alameldeen – Adaptive Cache Compression

cache miss rates
Cache Miss Rates

Misses Per

1000 Instructions

0.09 2.52 12.28 14.38

Penalized Hits Per

Avoided Miss

6709 489 12.3 4.7

Alaa Alameldeen – Adaptive Cache Compression

adapting to l2 sizes
Adapting to L2 Sizes

Misses Per

1000 Instructions

104.8 36.9 0.09 0.05

Penalized Hits Per

Avoided Miss

0.93 5.7 6503 326000

Alaa Alameldeen – Adaptive Cache Compression

conclusions
Conclusions
  • Cache compression increases cache capacity but slows down cache hit time
    • Helps some benchmarks (e.g., apache, mcf)
    • Hurts other benchmarks (e.g., gcc, ammp)
  • Our Proposal: Adaptive compression
    • Uses (LRU) replacement stack to determine whether compression helps or hurts
    • Updates a single global saturating counter on cache accesses
  • Adaptive compression performs similar to the better of Always Compress and Never Compress

Alaa Alameldeen – Adaptive Cache Compression

backup slides
Backup Slides
  • Frequent Pattern Compression (FPC)
  • Decoupled Variable-Segment Cache
  • Classification of L2 Accesses
  • (LRU) Stack Replacement
  • Cache Miss Rates
  • Adapting to L2 Sizes – mcf
  • Adapting to L1 Size
  • Adapting to Decompression Latency – mcf
  • Adapting to Decompression Latency – ammp
  • Phase Behavior – gcc
  • Phase Behavior – mcf
  • Can We Do Better Than Adaptive?

Alaa Alameldeen – Adaptive Cache Compression

decoupled variable segment cache6
Decoupled Variable-Segment Cache
  • Each set contains four tags and space for two uncompressed lines
  • Data area divided into 8-byte segments
  • Each tag is composed of:
    • Address tag
    • Permissions
    • CStatus : 1 if the line is compressed, 0 otherwise
    • CSize: Size of compressed line in segments
    • LRU/replacement bits

Same as uncompressed cache

Alaa Alameldeen – Adaptive Cache Compression

frequent pattern compression
Frequent Pattern Compression
  • A significance-based compression algorithm
  • Related Work:
    • X-Match and X-RL Algorithms [Kjelso, et al., 1996]
    • Address and data significance-based compression [Farrens and Park, 1991, Citron and Rudolph, 1995, Canal, et al., 2000]
  • A 64-byte line is decompressed in five cycles
  • More details in technical report:
    • “Frequent Pattern Compression: A Significance-Based Compression Algorithm for L2 Caches,” Alaa R. Alameldeen and David A. Wood, Dept. of Computer Sciences Technical Report CS-TR-2004-1500, April 2004 (available online).

Alaa Alameldeen – Adaptive Cache Compression

frequent pattern compression fpc
Frequent Pattern Compression (FPC)
  • A significance-based compression algorithm combined with zero run-length encoding
    • Compresses each 32-bit word separately
    • Suitable for short (32-256 byte) cache lines
    • Compressible Patterns: zero runs, sign-ext. 4,8,16-bits, zero-padded half-word, two SE half-words, repeated byte
    • A 64-byte line is decompressed in a five-stage pipeline
  • More details in technical report:
    • “Frequent Pattern Compression: A Significance-Based Compression Algorithm for L2 Caches,” Alaa R. Alameldeen and David A. Wood, Dept. of Computer Sciences Technical Report CS-TR-2004-1500, April 2004 (available online).

Alaa Alameldeen – Adaptive Cache Compression

classification of l2 accesses
Classification of L2 Accesses
  • Cache hits:
    • Unpenalized hit: Hit to an uncompressed line that would have hitwithout compression
    • Penalized hit: Hit to a compressed line that would have hit without compression
    • Avoided miss: Hit to a line that would NOT have hit without compression
  • Cache misses:
    • Avoidable miss: Miss to a line that would have hit with compression
    • Unavoidable miss: Miss to a line that would have missed even with compression

Alaa Alameldeen – Adaptive Cache Compression

lru stack replacement
(LRU) Stack Replacement
  • Differentiate penalized hits and avoided misses?
    • Only hits to top half of the tags in the LRU stack are penalized hits
  • Differentiate avoidable and unavoidable misses?
  • Is not dependent on LRU replacement
    • Any replacement algorithm for top half of tags
    • Any stack algorithm for the remaining tags

Alaa Alameldeen – Adaptive Cache Compression

cache miss rates1
Cache Miss Rates

Alaa Alameldeen – Adaptive Cache Compression

adapting to l2 sizes1
Adapting to L2 Sizes

Misses Per

1000 Instructions

98.9 88.1 12.4 0.02

Penalized Hits Per

Avoided Miss

11.6 4.4 12.6 2x106

Alaa Alameldeen – Adaptive Cache Compression

adapting to l1 size
Adapting to L1 Size

Alaa Alameldeen – Adaptive Cache Compression

adapting to decompression latency
Adapting to Decompression Latency

Alaa Alameldeen – Adaptive Cache Compression

adapting to decompression latency1
Adapting to Decompression Latency

Alaa Alameldeen – Adaptive Cache Compression

phase behavior
Phase Behavior

Predictor Value (K)

Cache Size (MB)

Alaa Alameldeen – Adaptive Cache Compression

phase behavior1
Phase Behavior

Predictor Value (K)

Cache Size (MB)

Alaa Alameldeen – Adaptive Cache Compression

can we do better than adaptive
Can We Do Better Than Adaptive?
  • Optimal is an unrealistic configuration: Always with no decompression penalty

Alaa Alameldeen – Adaptive Cache Compression