Ship signature based hit predictor for high performance caching
Sponsored Links
This presentation is the property of its rightful owner.
1 / 34

SHiP : Signature-based Hit Predictor for High Performance Caching PowerPoint PPT Presentation


  • 227 Views
  • Uploaded on
  • Presentation posted in: General

SHiP : Signature-based Hit Predictor for High Performance Caching. * Carole-Jean Wu, # Aamer Jaleel , #, + William Hasenplaugh, * Margaret Martonosi, # Simon Steely Jr., #, + Joel Emer * Princeton University # Intel Corporation, VSSAD #,+ MIT.

Download Presentation

SHiP : Signature-based Hit Predictor for High Performance Caching

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


SHiP: Signature-based Hit Predictor forHigh Performance Caching

*Carole-Jean Wu,#Aamer Jaleel, #,+William Hasenplaugh,

*Margaret Martonosi, #Simon Steely Jr., #,+Joel Emer

*Princeton University #Intel Corporation, VSSAD #,+MIT

IEEE/ACM International Symposium on Microarchitecture (MICRO’2011)


Motivation

  • Factors making caching important

    • Increasing ratio of CPU speed to memory speed

    • Multi-core poses challenges on better shared cache management

  • LRU has been the standard LLC replacement policy

    • However LRU has problems!


Problems with LRU Replacement

  • Working set larger than the cache causes thrashing

miss

miss

miss

miss

miss

Wsize

LLCsize

  • References to non-temporal data (scans) discards frequently referenced working set

hit

hit

hit

miss

hit

miss

miss

scan

scan

scan

LLCsize

Wsize

  • scansoccur frequently in commercial workloads


Desired Behavior from Cache Replacement

  • Working set larger than the cache  Preserve some of working set in the cache

hit

hit

hit

hit

hit

miss

miss

miss

miss

miss

Wsize

LLCsize

[ DIP (ISCA’07), DRRIP (ISCA’10) achieves this effect ]

  • Recurring scans  Preserve frequently referenced working set in the cache

hit

hit

hit

hit

hit

hit

hit

scan

scan

scan

[ SRRIP (ISCA’10) achieves this effect ]


Dynamic Re-Reference Interval Prediction ( DRRIP )

(SRRIP)

Scan-Resistant

( BRRIP )

Thrash-Resistant

insertion

insertion

0

Imme-

diate

1

Inter-

mediate

2

far

3

distant

No Victim

No Victim

No Victim

re-reference

eviction

re-reference

re-reference

[ Jaleel et al., ISCA’10 ]


SRRIP Not Always Scan Resistant…

  • LONG scans in access pattern

hit

miss

hit

hit

miss

“short” scan

“long” scan


SRRIP Not Always Scan Resistant…

  • LONG scans in access pattern

hit

miss

hit

hit

miss

“short” scan

“long” scan

  • Active working-set MUST beRE-REFERENCED at least ONCEbetween scans

miss

miss

miss

miss

scan

scan

scan


SRRIP Not Always Scan Resistant…

  • LONG scans in access pattern

hit

hit

miss

hit

hit

miss

“short” scan

“long” scan

  • Active working-set MUST beRE-REFERENCED at least ONCEbetween scans

miss

miss

hit

hit

miss

miss

hit

scan

scan

scan

  • Can We Be More Intelligent in Dealing with Scans?


Closer Look at Scan Access Patterns

scan

scan

No Future References

Future Reference

  • Assuming Perfect Knowledge of Re-Reference Pattern


Improving RRIP on Cache Insertions

 Improve Insertion 

scan

0

Imme-

diate

1

Inter-

mediate

2

far

3

distant

No Victim

No Victim

No Victim

re-reference

eviction

re-reference

re-reference

  • Need to Assign DIFFERENT Re-Reference Predictions on Cache Insertion


Focus of this Paper…

  • Goal: Learn re-reference interval of a cache line

PREDICTOR

0: immediate

1: intermediate

2: far

3: distant

cache access

re-reference

prediction

  • How Best to Learn the Re-Reference Interval?


Learning Re-Reference Behavior

scan

scan

REFERENCE SAME

MEMORY REGION

REFERENCED BY

SIMILAR SET OF PCs

  • Can We Learn Re-References By Correlating Accesses With Some Other Information?


Learning Re-Reference Behavior

scan

scan

REFERENCE SAME

MEMORY REGION

REFERENCED BY

SIMILAR SET OF PCs

  • Can We Learn Re-References By Correlating Accesses With Some Other Information?


Using Signatures to Correlate Re-Reference

  • Different types of information:

    • Memory Region

    • Memory Instruction PC

    • Instruction Sequence

  • Observation: LLC accesses by the same “signature” tend to have similar re-reference patterns

scan

scan

“signature“

  • OBSERVE, LEARNandPREDICT Re-Reference Pattern of a Signature


Observe Signature Re-Reference Behavior

  • Observe re-reference pattern in the baseline cache

Address

Load/Store

  • Cache Tag

  • Replacement State

  • Coherence State

LLC


Observe Signature Re-Reference Behavior

  • Observe re-reference pattern in the baseline cache

  • Hardware Required:

    • Was line re-referenced after cache insertion ( 1-bit )

    • “Signature” responsible for cache insertion ( 14-bits )

Signature

Address

Load/Store

  • reuse bit

  • signature_insert

metadata

LLC


Learn Signature Re-Reference Behavior

  • Learn signature re-reference behavior

  • Hardware Required:

    • Signature History Counter Table (SHCT) ( 16K, 2-bit counters )

  • SHCT Training:

    • If evicted line reused:

      SHCT [ signature_insert ] ++

    • If evicted line NOT reused:

      SHCT [ signature_insert ] --

counter = 0, signature NOT re-referenced

counter != 0, signature re-referenced

SHCT

Last Level Cache (LLC)


Signature-based Hit Predictor (SHiP)

  • Predict re-reference interval of line using SHCT

SHiP

SHCT

0: immediate

1: intermediate

2: far

3: distant

cache hit/miss

re-reference

prediction

signature


Signature-based Hit Predictor (SHiP)

  • Predict re-reference interval using SHCT on CACHE MISS

SHiP Re-Reference Predictions On Miss

if ( SHCT [ signature ] == 0 )

if ( SHCT [ signature ] == 0 )

0: immediate

1: intermediate

2: far

3: distant

cache miss

re-reference

prediction

predict DISTANT (i.e. 3)

signature

else

predict FAR (i.e. 2)


Signature-based Hit Predictor (SHiP)

  • Predict re-reference interval on CACHE HIT

SHiP Re-Reference Predictions On Hit

0: immediate

1: intermediate

2: far

3: distant

cache hit

re-reference

prediction

Always predict IMMEDIATE (i.e. 0)

signature


SHiP – High Level Architectural Overview

Signature

Address

Access Type

data

hit/miss

SHiP

SHCT Training

SHCT

signature_insert

reuse_bit

LLC hit/miss

Re-Reference Prediction

Last Level Cache (LLC)


SHiP – High Level Architectural Overview

Per-Line Overhead Can Be Reduced by using

Set Sampling ( need only 32 - 64 sets )

Signature

Address

Access Type

data

hit/miss

SHiP

SHCT Training

SHCT

signature_insert

reuse_bit

LLC hit/miss

Last Level Cache (LLC)

Re-Reference Prediction


SHiP – High Level Architectural Overview

Per-Line Overhead Can Be Reduced by using

Set Sampling ( need only 32 - 64 sets )

Address

Access Type

Signature

data

hit/miss

SHiP

SHCT Training

SHCT

~6 KB

NO CHANGE

signature_insert

reuse_bit

LLC hit/miss

Last Level Cache (LLC)

Re-Reference Prediction


Performance Comparison of Replacement Policies

16-way 2MB LLC

Core i7 Type Hierarchy

SHiP Significantly Improves Performance Across All Workload Categories


Performance Comparison of Replacement PoliciesCRC Results Comparison

16-way 1MB Private Cache

65 Single-Threaded Workloads

Averaged Across PC Games, Multimedia, Enterprise Server, SPEC CPU2006 Workloads

S

H

i

P

SHiP

  • 16-way 4MB Shared Cache

  • 165 4-core Workloads

SHiP Has 2X Performance Improvements of Prior State-of-the-Art Policies


Total Storage Overhead (16-way Set Associative Cache)

  • LRU:4-bits / cache block

  • Pseudo-LRU1-bit / cache block

  • RRIP:[ ISCA’10 ]2-bits / cache block

  • Seg-LRU:[ CRC’10 ]~8-bits / cache block

  • SDBP:[ MICRO’10 ]~10-bits / cache block

  • SHiP:[ MICRO’11 ]~5-bits / cache block

SHiP Outperforms State-of-the-Art with HW Similar to LRU


Summary

  • Scan-resistance is an important problem in commercial workloads

    • State-of-the art policies do not fully address scan-resistance

  • Signatures help improve re-reference predictions to address scans

    • Need fine-grained re-reference predictions at insertion

  • Proposed a Simple and Practical Scan-Resistant Replacement

  • SHiP significantly outperforms winner of CRC Championship

    • SHiP requires less storage than CRC winner

    • HW overhead of SHiP is comparable to LRU


Q&A


Q&A


Q&A


Re-Reference Interval Prediction ( RRIP )

CAN INSERTION BE

MORE INTELLIGENT?

Scan-Resistant

insertion

0

Imme-

diate

1

Inter-

mediate

2

far

3

distant

No Victim

No Victim

No Victim

re-reference

eviction

re-reference

re-reference


Using Signatures to Correlate Re-Reference Behavior

SIGN

ATURE

a

b

a

c

d

c

Example Signatures

Memory Region Program Counter Instruction Decode History

scan

scan

No Future Cache Hits

Future Cache Hits

c

a

b

d


LRU vs. Re-Reference Interval Prediction (RRIP)

0

0

1

1

2

2

3

3

4

4

5

5

6

6

7

7

Physical Way #

Physical Way #

LRU

Cache Tag

Cache Tag

c

c

g

g

d

f

h

s

e

s

b

h

b

f

d

e

“LRU Chain” position

Re-Reference Prediction

1

0

2

2

RRIP Outperforms LRU with Storage Less Than LRU

5

4

3

6

0

7

0

2

2

2

3

0

3

1

RRIP


Signature-based Hit Predictor (SHiP)

  • Goal: Predict the re-reference behavior of a signature

  • Learn Re-Reference Behavior:

Signature

Address

Access Type

data

hit/miss

LLC


  • Login