Adaptive cache compression for high performance processors
Download
1 / 49

Adaptive Cache Compression for High-Performance Processors - PowerPoint PPT Presentation


  • 139 Views
  • Uploaded on

Adaptive Cache Compression for High-Performance Processors. Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project http://www.cs.wisc.edu/multifacet. Overview. Design of high performance processors Processor speed improves faster than memory

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Adaptive Cache Compression for High-Performance Processors' - russ


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Adaptive cache compression for high performance processors

Adaptive Cache Compression for High-Performance Processors

Alaa Alameldeen and David Wood

University of Wisconsin-Madison

Wisconsin Multifacet Project

http://www.cs.wisc.edu/multifacet


Overview
Overview

  • Design of high performance processors

    • Processor speed improves faster than memory

  • Memory latency dominates performance

    • Need more effective cache designs

  • On-chip cache compression

    • Increases effective cache size

    • Increases cache hit latency

  • Does cache compression help or hurt?

Alaa Alameldeen – Adaptive Cache Compression


Does cache compression help or hurt
Does Cache Compression Help or Hurt?

Alaa Alameldeen – Adaptive Cache Compression


Does cache compression help or hurt1
Does Cache Compression Help or Hurt?

Alaa Alameldeen – Adaptive Cache Compression


Does cache compression help or hurt2
Does Cache Compression Help or Hurt?

Alaa Alameldeen – Adaptive Cache Compression


Does cache compression help or hurt3
Does Cache Compression Help or Hurt?

  • Adaptive Compression determines when compression is beneficial

Alaa Alameldeen – Adaptive Cache Compression


Outline
Outline

  • Motivation

  • Cache Compression Framework

    • Compressed Cache Hierarchy

    • Decoupled Variable-Segment Cache

  • Adaptive Compression

  • Evaluation

  • Conclusions

Alaa Alameldeen – Adaptive Cache Compression


Compressed cache hierarchy

Instruction

Fetcher

Load-Store

Queue

L1 I-Cache

(Uncompressed)

L1 D-Cache

(Uncompressed)

Uncompressed

Line

Bypass

Decompression

Pipeline

L1 Victim Cache

Compression

Pipeline

From Memory

To Memory

L2 Cache (Compressed)

Compressed Cache Hierarchy

Alaa Alameldeen – Adaptive Cache Compression


Decoupled variable segment cache
Decoupled Variable-Segment Cache

  • Objective: pack more lines into the same space

Tag Area

Data Area

Address A

Address B

  • 2-way set-associative with 64-byte lines

  • Tag Contains Address Tag, Permissions, LRU (Replacement) Bits

Alaa Alameldeen – Adaptive Cache Compression


Decoupled variable segment cache1
Decoupled Variable-Segment Cache

  • Objective: pack more lines into the same space

Tag Area

Data Area

Address A

Address B

Address C

Address D

Add two more tags

Alaa Alameldeen – Adaptive Cache Compression


Decoupled variable segment cache2
Decoupled Variable-Segment Cache

  • Objective: pack more lines into the same space

Tag Area

Data Area

Address A

Address B

Address C

Address D

Add Compression Size, Status, More LRU bits

Alaa Alameldeen – Adaptive Cache Compression


Decoupled variable segment cache3
Decoupled Variable-Segment Cache

  • Objective: pack more lines into the same space

Tag Area

Data Area

Address A

Address B

Address C

Address D

Divide Data Area into 8-byte segments

Alaa Alameldeen – Adaptive Cache Compression


Decoupled variable segment cache4
Decoupled Variable-Segment Cache

  • Objective: pack more lines into the same space

Tag Area

Data Area

Address A

Address B

Address C

Address D

Data lines composed of 1-8 segments

Alaa Alameldeen – Adaptive Cache Compression


Decoupled variable segment cache5
Decoupled Variable-Segment Cache

  • Objective: pack more lines into the same space

Tag Area

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

Tag is present but line isn’t

Compression Status

Compressed Size

Alaa Alameldeen – Adaptive Cache Compression


Outline1
Outline

  • Motivation

  • Cache Compression Framework

  • Adaptive Compression

    • Key Insight

    • Classification of L2 accesses

    • Global compression predictor

  • Evaluation

  • Conclusions

Alaa Alameldeen – Adaptive Cache Compression


Adaptive compression

Benefit(Compression)

> Cost(Compression)

No

Yes

Do not compress

future lines

Compress

future lines

Adaptive Compression

  • Use past to predict future

  • Key Insight:

    • LRU Stack [Mattson, et al., 1970] indicates for each reference whether compression helps or hurts

Alaa Alameldeen – Adaptive Cache Compression


Cost benefit classification
Cost/Benefit Classification

LRU Stack

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

  • Classify each cache reference

  • Four-way SA cache with space for two 64-byte lines

    • Total of 16 available segments

Alaa Alameldeen – Adaptive Cache Compression


An unpenalized hit
An Unpenalized Hit

LRU Stack

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

  • Read/Write Address A

    • LRU Stack order = 1 ≤ 2  Hit regardless of compression

    • Uncompressed Line  No decompression penalty

    • Neither cost nor benefit

Alaa Alameldeen – Adaptive Cache Compression


A penalized hit
A Penalized Hit

LRU Stack

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

  • Read/Write Address B

    • LRU Stack order = 2 ≤ 2  Hit regardless of compression

    • Compressed Line  Decompression penalty incurred

    • Compression cost

Alaa Alameldeen – Adaptive Cache Compression


An avoided miss
An Avoided Miss

LRU Stack

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

  • Read/Write Address C

    • LRU Stack order = 3 > 2  Hit only because of compression

    • Compression benefit: Eliminated off-chip miss

Alaa Alameldeen – Adaptive Cache Compression


An avoidable miss
An Avoidable Miss

LRU Stack

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

  • Read/Write Address D

    • Line is not in the cache but tag exists at LRU stack order = 4

    • Missed only because some lines are not compressed

    • Potential compression benefit

Sum(CSize) = 15 ≤ 16

Alaa Alameldeen – Adaptive Cache Compression


An unavoidable miss
An Unavoidable Miss

LRU Stack

Data Area

Addr A uncompressed 3

Addr B compressed 2

Addr C compressed 6

Addr D compressed 4

  • Read/Write Address E

    • LRU stack order > 4  Compression wouldn’t have helped

    • Line is not in the cache and tag does not exist

    • Neither cost nor benefit

Alaa Alameldeen – Adaptive Cache Compression


Compression predictor
Compression Predictor

  • Estimate: Benefit(Compression) – Cost(Compression)

  • Single counter : Global Compression Predictor (GCP)

    • Saturating up/down 19-bit counter

  • GCP updated on each cache access

    • Benefit: Increment by memory latency

    • Cost: Decrement by decompression latency

    • Optimization: Normalize to decompression latency = 1

  • Cache Allocation

    • Allocate compressed line if GCP  0

    • Allocate uncompressed lines if GCP < 0

Alaa Alameldeen – Adaptive Cache Compression


Outline2
Outline

  • Motivation

  • Cache Compression Framework

  • Adaptive Compression

  • Evaluation

    • Simulation Setup

    • Performance

  • Conclusions

Alaa Alameldeen – Adaptive Cache Compression


Simulation setup
Simulation Setup

  • Simics full system simulator augmented with:

    • Detailed OoO processor simulator [TFSim, Mauer, et al., 2002]

    • Detailed memory timing simulator [Martin, et al., 2002]

  • Workloads:

    • Commercial workloads:

      • Database servers: OLTP and SPECJBB

      • Static Web serving: Apache and Zeus

    • SPEC2000 benchmarks:

      • SPECint: bzip, gcc, mcf, twolf

      • SPECfp: ammp, applu, equake, swim

Alaa Alameldeen – Adaptive Cache Compression


System configuration
System configuration

  • A dynamically scheduled SPARC V9 uniprocessor

  • Configuration parameters:

Alaa Alameldeen – Adaptive Cache Compression


Simulated cache configurations
Simulated Cache Configurations

  • Always: All compressible lines are stored in compressed format

    • Decompression penalty for all compressed lines

  • Never: All cache lines are stored in uncompressed format

    • Cache is 8-way set associative with half the number of sets

    • Does not incur decompression penalty

  • Adaptive: Our adaptive compression scheme

Alaa Alameldeen – Adaptive Cache Compression


Performance
Performance

SpecINT

SpecFP

Commercial

Alaa Alameldeen – Adaptive Cache Compression


Performance1
Performance

Alaa Alameldeen – Adaptive Cache Compression


Performance2
Performance

35% Speedup

18% Slowdown

Alaa Alameldeen – Adaptive Cache Compression


Performance3
Performance

Bug in GCP update

Adaptive performs similar to the best of Always and Never

Alaa Alameldeen – Adaptive Cache Compression


Effective cache capacity
Effective Cache Capacity

Alaa Alameldeen – Adaptive Cache Compression


Cache miss rates
Cache Miss Rates

Misses Per

1000 Instructions

0.09 2.52 12.28 14.38

Penalized Hits Per

Avoided Miss

6709 489 12.3 4.7

Alaa Alameldeen – Adaptive Cache Compression


Adapting to l2 sizes
Adapting to L2 Sizes

Misses Per

1000 Instructions

104.8 36.9 0.09 0.05

Penalized Hits Per

Avoided Miss

0.93 5.7 6503 326000

Alaa Alameldeen – Adaptive Cache Compression


Conclusions
Conclusions

  • Cache compression increases cache capacity but slows down cache hit time

    • Helps some benchmarks (e.g., apache, mcf)

    • Hurts other benchmarks (e.g., gcc, ammp)

  • Our Proposal: Adaptive compression

    • Uses (LRU) replacement stack to determine whether compression helps or hurts

    • Updates a single global saturating counter on cache accesses

  • Adaptive compression performs similar to the better of Always Compress and Never Compress

Alaa Alameldeen – Adaptive Cache Compression


Backup slides
Backup Slides

  • Frequent Pattern Compression (FPC)

  • Decoupled Variable-Segment Cache

  • Classification of L2 Accesses

  • (LRU) Stack Replacement

  • Cache Miss Rates

  • Adapting to L2 Sizes – mcf

  • Adapting to L1 Size

  • Adapting to Decompression Latency – mcf

  • Adapting to Decompression Latency – ammp

  • Phase Behavior – gcc

  • Phase Behavior – mcf

  • Can We Do Better Than Adaptive?

Alaa Alameldeen – Adaptive Cache Compression


Decoupled variable segment cache6
Decoupled Variable-Segment Cache

  • Each set contains four tags and space for two uncompressed lines

  • Data area divided into 8-byte segments

  • Each tag is composed of:

    • Address tag

    • Permissions

    • CStatus : 1 if the line is compressed, 0 otherwise

    • CSize: Size of compressed line in segments

    • LRU/replacement bits

Same as uncompressed cache

Alaa Alameldeen – Adaptive Cache Compression


Frequent pattern compression
Frequent Pattern Compression

  • A significance-based compression algorithm

  • Related Work:

    • X-Match and X-RL Algorithms [Kjelso, et al., 1996]

    • Address and data significance-based compression [Farrens and Park, 1991, Citron and Rudolph, 1995, Canal, et al., 2000]

  • A 64-byte line is decompressed in five cycles

  • More details in technical report:

    • “Frequent Pattern Compression: A Significance-Based Compression Algorithm for L2 Caches,” Alaa R. Alameldeen and David A. Wood, Dept. of Computer Sciences Technical Report CS-TR-2004-1500, April 2004 (available online).

Alaa Alameldeen – Adaptive Cache Compression


Frequent pattern compression fpc
Frequent Pattern Compression (FPC)

  • A significance-based compression algorithm combined with zero run-length encoding

    • Compresses each 32-bit word separately

    • Suitable for short (32-256 byte) cache lines

    • Compressible Patterns: zero runs, sign-ext. 4,8,16-bits, zero-padded half-word, two SE half-words, repeated byte

    • A 64-byte line is decompressed in a five-stage pipeline

  • More details in technical report:

    • “Frequent Pattern Compression: A Significance-Based Compression Algorithm for L2 Caches,” Alaa R. Alameldeen and David A. Wood, Dept. of Computer Sciences Technical Report CS-TR-2004-1500, April 2004 (available online).

Alaa Alameldeen – Adaptive Cache Compression


Classification of l2 accesses
Classification of L2 Accesses

  • Cache hits:

    • Unpenalized hit: Hit to an uncompressed line that would have hitwithout compression

    • Penalized hit: Hit to a compressed line that would have hit without compression

    • Avoided miss: Hit to a line that would NOT have hit without compression

  • Cache misses:

    • Avoidable miss: Miss to a line that would have hit with compression

    • Unavoidable miss: Miss to a line that would have missed even with compression

Alaa Alameldeen – Adaptive Cache Compression


Lru stack replacement
(LRU) Stack Replacement

  • Differentiate penalized hits and avoided misses?

    • Only hits to top half of the tags in the LRU stack are penalized hits

  • Differentiate avoidable and unavoidable misses?

  • Is not dependent on LRU replacement

    • Any replacement algorithm for top half of tags

    • Any stack algorithm for the remaining tags

Alaa Alameldeen – Adaptive Cache Compression


Cache miss rates1
Cache Miss Rates

Alaa Alameldeen – Adaptive Cache Compression


Adapting to l2 sizes1
Adapting to L2 Sizes

Misses Per

1000 Instructions

98.9 88.1 12.4 0.02

Penalized Hits Per

Avoided Miss

11.6 4.4 12.6 2x106

Alaa Alameldeen – Adaptive Cache Compression


Adapting to l1 size
Adapting to L1 Size

Alaa Alameldeen – Adaptive Cache Compression


Adapting to decompression latency
Adapting to Decompression Latency

Alaa Alameldeen – Adaptive Cache Compression


Adapting to decompression latency1
Adapting to Decompression Latency

Alaa Alameldeen – Adaptive Cache Compression


Phase behavior
Phase Behavior

Predictor Value (K)

Cache Size (MB)

Alaa Alameldeen – Adaptive Cache Compression


Phase behavior1
Phase Behavior

Predictor Value (K)

Cache Size (MB)

Alaa Alameldeen – Adaptive Cache Compression


Can we do better than adaptive
Can We Do Better Than Adaptive?

  • Optimal is an unrealistic configuration: Always with no decompression penalty

Alaa Alameldeen – Adaptive Cache Compression


ad