Ship signature based hit predictor for high performance caching
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

SHiP : Signature-based Hit Predictor for High Performance Caching PowerPoint PPT Presentation


  • 205 Views
  • Uploaded on
  • Presentation posted in: General

SHiP : Signature-based Hit Predictor for High Performance Caching. * Carole-Jean Wu, # Aamer Jaleel , #, + William Hasenplaugh, * Margaret Martonosi, # Simon Steely Jr., #, + Joel Emer * Princeton University # Intel Corporation, VSSAD #,+ MIT.

Download Presentation

SHiP : Signature-based Hit Predictor for High Performance Caching

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ship signature based hit predictor for high performance caching

SHiP: Signature-based Hit Predictor forHigh Performance Caching

*Carole-Jean Wu,#Aamer Jaleel, #,+William Hasenplaugh,

*Margaret Martonosi, #Simon Steely Jr., #,+Joel Emer

*Princeton University #Intel Corporation, VSSAD #,+MIT

IEEE/ACM International Symposium on Microarchitecture (MICRO’2011)


Motivation

Motivation

  • Factors making caching important

    • Increasing ratio of CPU speed to memory speed

    • Multi-core poses challenges on better shared cache management

  • LRU has been the standard LLC replacement policy

    • However LRU has problems!


Problems with lru replacement

Problems with LRU Replacement

  • Working set larger than the cache causes thrashing

miss

miss

miss

miss

miss

Wsize

LLCsize

  • References to non-temporal data (scans) discards frequently referenced working set

hit

hit

hit

miss

hit

miss

miss

scan

scan

scan

LLCsize

Wsize

  • scansoccur frequently in commercial workloads


Desired behavior from cache replacement

Desired Behavior from Cache Replacement

  • Working set larger than the cache  Preserve some of working set in the cache

hit

hit

hit

hit

hit

miss

miss

miss

miss

miss

Wsize

LLCsize

[ DIP (ISCA’07), DRRIP (ISCA’10) achieves this effect ]

  • Recurring scans  Preserve frequently referenced working set in the cache

hit

hit

hit

hit

hit

hit

hit

scan

scan

scan

[ SRRIP (ISCA’10) achieves this effect ]


Dynamic re reference interval prediction drrip

Dynamic Re-Reference Interval Prediction ( DRRIP )

(SRRIP)

Scan-Resistant

( BRRIP )

Thrash-Resistant

insertion

insertion

0

Imme-

diate

1

Inter-

mediate

2

far

3

distant

No Victim

No Victim

No Victim

re-reference

eviction

re-reference

re-reference

[ Jaleel et al., ISCA’10 ]


Srrip not always scan resistant

SRRIP Not Always Scan Resistant…

  • LONG scans in access pattern

hit

miss

hit

hit

miss

“short” scan

“long” scan


Srrip not always scan resistant1

SRRIP Not Always Scan Resistant…

  • LONG scans in access pattern

hit

miss

hit

hit

miss

“short” scan

“long” scan

  • Active working-set MUST beRE-REFERENCED at least ONCEbetween scans

miss

miss

miss

miss

scan

scan

scan


Srrip not always scan resistant2

SRRIP Not Always Scan Resistant…

  • LONG scans in access pattern

hit

hit

miss

hit

hit

miss

“short” scan

“long” scan

  • Active working-set MUST beRE-REFERENCED at least ONCEbetween scans

miss

miss

hit

hit

miss

miss

hit

scan

scan

scan

  • Can We Be More Intelligent in Dealing with Scans?


Closer look at scan access patterns

Closer Look at Scan Access Patterns

scan

scan

No Future References

Future Reference

  • Assuming Perfect Knowledge of Re-Reference Pattern


Improving rrip on cache insertions

Improving RRIP on Cache Insertions

 Improve Insertion 

scan

0

Imme-

diate

1

Inter-

mediate

2

far

3

distant

No Victim

No Victim

No Victim

re-reference

eviction

re-reference

re-reference

  • Need to Assign DIFFERENT Re-Reference Predictions on Cache Insertion


Focus of this paper

Focus of this Paper…

  • Goal: Learn re-reference interval of a cache line

PREDICTOR

0: immediate

1: intermediate

2: far

3: distant

cache access

re-reference

prediction

  • How Best to Learn the Re-Reference Interval?


Learning re reference behavior

Learning Re-Reference Behavior

scan

scan

REFERENCE SAME

MEMORY REGION

REFERENCED BY

SIMILAR SET OF PCs

  • Can We Learn Re-References By Correlating Accesses With Some Other Information?


Learning re reference behavior1

Learning Re-Reference Behavior

scan

scan

REFERENCE SAME

MEMORY REGION

REFERENCED BY

SIMILAR SET OF PCs

  • Can We Learn Re-References By Correlating Accesses With Some Other Information?


Using signatures to correlate re reference

Using Signatures to Correlate Re-Reference

  • Different types of information:

    • Memory Region

    • Memory Instruction PC

    • Instruction Sequence

  • Observation: LLC accesses by the same “signature” tend to have similar re-reference patterns

scan

scan

“signature“

  • OBSERVE, LEARNandPREDICT Re-Reference Pattern of a Signature


Observe signature re reference behavior

Observe Signature Re-Reference Behavior

  • Observe re-reference pattern in the baseline cache

Address

Load/Store

  • Cache Tag

  • Replacement State

  • Coherence State

LLC


Observe signature re reference behavior1

Observe Signature Re-Reference Behavior

  • Observe re-reference pattern in the baseline cache

  • Hardware Required:

    • Was line re-referenced after cache insertion ( 1-bit )

    • “Signature” responsible for cache insertion ( 14-bits )

Signature

Address

Load/Store

  • reuse bit

  • signature_insert

metadata

LLC


Learn signature re reference behavior

Learn Signature Re-Reference Behavior

  • Learn signature re-reference behavior

  • Hardware Required:

    • Signature History Counter Table (SHCT) ( 16K, 2-bit counters )

  • SHCT Training:

    • If evicted line reused:

      SHCT [ signature_insert ] ++

    • If evicted line NOT reused:

      SHCT [ signature_insert ] --

counter = 0, signature NOT re-referenced

counter != 0, signature re-referenced

SHCT

Last Level Cache (LLC)


Signature based hit predictor ship

Signature-based Hit Predictor (SHiP)

  • Predict re-reference interval of line using SHCT

SHiP

SHCT

0: immediate

1: intermediate

2: far

3: distant

cache hit/miss

re-reference

prediction

signature


Signature based hit predictor ship1

Signature-based Hit Predictor (SHiP)

  • Predict re-reference interval using SHCT on CACHE MISS

SHiP Re-Reference Predictions On Miss

if ( SHCT [ signature ] == 0 )

if ( SHCT [ signature ] == 0 )

0: immediate

1: intermediate

2: far

3: distant

cache miss

re-reference

prediction

predict DISTANT (i.e. 3)

signature

else

predict FAR (i.e. 2)


Signature based hit predictor ship2

Signature-based Hit Predictor (SHiP)

  • Predict re-reference interval on CACHE HIT

SHiP Re-Reference Predictions On Hit

0: immediate

1: intermediate

2: far

3: distant

cache hit

re-reference

prediction

Always predict IMMEDIATE (i.e. 0)

signature


Ship high level architectural overview

SHiP – High Level Architectural Overview

Signature

Address

Access Type

data

hit/miss

SHiP

SHCT Training

SHCT

signature_insert

reuse_bit

LLC hit/miss

Re-Reference Prediction

Last Level Cache (LLC)


Ship high level architectural overview1

SHiP – High Level Architectural Overview

Per-Line Overhead Can Be Reduced by using

Set Sampling ( need only 32 - 64 sets )

Signature

Address

Access Type

data

hit/miss

SHiP

SHCT Training

SHCT

signature_insert

reuse_bit

LLC hit/miss

Last Level Cache (LLC)

Re-Reference Prediction


Ship high level architectural overview2

SHiP – High Level Architectural Overview

Per-Line Overhead Can Be Reduced by using

Set Sampling ( need only 32 - 64 sets )

Address

Access Type

Signature

data

hit/miss

SHiP

SHCT Training

SHCT

~6 KB

NO CHANGE

signature_insert

reuse_bit

LLC hit/miss

Last Level Cache (LLC)

Re-Reference Prediction


Performance comparison of replacement policies

Performance Comparison of Replacement Policies

16-way 2MB LLC

Core i7 Type Hierarchy

SHiP Significantly Improves Performance Across All Workload Categories


Performance comparison of replacement policies crc results comparison

Performance Comparison of Replacement PoliciesCRC Results Comparison

16-way 1MB Private Cache

65 Single-Threaded Workloads

Averaged Across PC Games, Multimedia, Enterprise Server, SPEC CPU2006 Workloads

S

H

i

P

SHiP

  • 16-way 4MB Shared Cache

  • 165 4-core Workloads

SHiP Has 2X Performance Improvements of Prior State-of-the-Art Policies


Total storage overhead 16 way set associative cache

Total Storage Overhead (16-way Set Associative Cache)

  • LRU:4-bits / cache block

  • Pseudo-LRU1-bit / cache block

  • RRIP:[ ISCA’10 ]2-bits / cache block

  • Seg-LRU:[ CRC’10 ]~8-bits / cache block

  • SDBP:[ MICRO’10 ]~10-bits / cache block

  • SHiP:[ MICRO’11 ]~5-bits / cache block

SHiP Outperforms State-of-the-Art with HW Similar to LRU


Summary

Summary

  • Scan-resistance is an important problem in commercial workloads

    • State-of-the art policies do not fully address scan-resistance

  • Signatures help improve re-reference predictions to address scans

    • Need fine-grained re-reference predictions at insertion

  • Proposed a Simple and Practical Scan-Resistant Replacement

  • SHiP significantly outperforms winner of CRC Championship

    • SHiP requires less storage than CRC winner

    • HW overhead of SHiP is comparable to LRU


Ship signature based hit predictor for high performance caching

Q&A


Ship signature based hit predictor for high performance caching

Q&A


Ship signature based hit predictor for high performance caching

Q&A


Re reference interval prediction rrip

Re-Reference Interval Prediction ( RRIP )

CAN INSERTION BE

MORE INTELLIGENT?

Scan-Resistant

insertion

0

Imme-

diate

1

Inter-

mediate

2

far

3

distant

No Victim

No Victim

No Victim

re-reference

eviction

re-reference

re-reference


Using signatures to correlate re reference behavior

Using Signatures to Correlate Re-Reference Behavior

SIGN

ATURE

a

b

a

c

d

c

Example Signatures

Memory Region Program Counter Instruction Decode History

scan

scan

No Future Cache Hits

Future Cache Hits

c

a

b

d


Lru vs re reference interval prediction rrip

LRU vs. Re-Reference Interval Prediction (RRIP)

0

0

1

1

2

2

3

3

4

4

5

5

6

6

7

7

Physical Way #

Physical Way #

LRU

Cache Tag

Cache Tag

c

c

g

g

d

f

h

s

e

s

b

h

b

f

d

e

“LRU Chain” position

Re-Reference Prediction

1

0

2

2

RRIP Outperforms LRU with Storage Less Than LRU

5

4

3

6

0

7

0

2

2

2

3

0

3

1

RRIP


S ignature based hi t p redictor ship

Signature-based Hit Predictor (SHiP)

  • Goal: Predict the re-reference behavior of a signature

  • Learn Re-Reference Behavior:

Signature

Address

Access Type

data

hit/miss

LLC


  • Login