Token Coherence for CMPs: Enhancing Performance and Complexity Management

Token Coherence for CMPs Mike Marty CS 838 12/09/2003

Outline • Motivation for Token Coherence • Token Coherence 101 • Issues with CMPs • Maintaining correctness • When to retry? • Persistent requests

Motivation #1: Performance Tradeoffs • Snooping • Low latency (w/o network contention) • Bandwidth doesn’t scale • Requires ordered interconnect • Directory • No broadcast == scalability • Enables unordered interconnects • Indirections increase latency • Can we get best of both worlds?

Motivation #2: Complexity • Even basic coherence protocols are hard to get right • Future sources of additional complexity • Prediction • e.g. Destination-Set Prediction • Hierarchy • CMPs • DNUCA

P P P P P P Complexity Example Dir/Mem 2. GETX 2. GETX 3. Fwd 3. Inv S L2 L2 4. Inv 1. GETX 2. WB 1. GETX 5. DATA 4. Fwd 4. Inv S O S S

Token Coherence • Global Invariant • For each block, allow either one writer or multiple readers • Enforce Locally • Each (logical) has T tokens (initially at memory) • Tokens never created nor destroyed • Components exchange tokens & data • All tokens to write <==> one writer invariant • At least one token to read <==> multiple readers Invariants enforced explicitly, rather than grab-bag of invalidations, acks, & nacks

Starvation Avoidance • Purpose: handle pathological cases • Detect possible starvation • E.g. > 4 retries issued • Invoke Persistent Request • Starving processor issues • Fair arbitration scheme activates request • Request persists at each processor until satisfied • Deactivate upon completion • Persistent Requests should be rare

Token Coherence Generalizes • Correctness Substrate • Safety with token counting • Starvation avoidance with persistent requests • Performance Protocol • Make the common case fast • Alternatives: • TokenB – broadcast • TokenM – multicast with prediction • TokenD – soft-state directory for a large system

Token Coherence for CMPs • Ensuring safety with tokens • Tokens to individual caches • Exploiting CMP locality • Extra tokens to chip on initial requests • Scaling Persistent Requests

2-Level Directory TokenCMP … P P … P P Local Dir Local Dir Global Dir Local Dir Local Dir … … P P P P TokenCMP logically

TokenCMP Reissues • Transient requests can fail • Coherence race • Network contention • When to reissue? • Reissue transient or persistent request?

TokenCMP Persistent Requests • Recall: • Anything that holds/sends/receives tokens must remember all outstanding persistent requests • Scale persistent requests via hierarchy • One outstanding persistent request per chip • Achieve performance via locality • Satisfy all local persistent requests first

Conclusion • CMPs move the complexity from the processors to the memory system • Token Coherence reduces complexity by separating performance from correctness • CMP performance largely depends on coherence • Token Coherence gives designers more performance tradeoffs

Max Tokens Request to write Delayed T=0 T=0 T=16 (R/W) 1 P0 P1 P2 2 3 Delayed Request to read • P0 issues a request to write (delayed to P2) • P1 issues a request to read Token Coherence Example

T=0 T=16 (R/W) P0 P2 T=0 P1 T=1(R) T=15(R) 1 2 4 T=1 3 • P2 responds with data to P1

T=0 T=16 (R/W) P0 P2 T=0 5 P1 T=1(R) T=15(R) 1 2 4 3 • P0’s delayed request arrives at P2

T=0 P1 6 T=15 T=0 T=1(R) T=15(R) 1 T=15(R) T=16 (R/W) 5 P0 P2 2 7 4 3 • P2 responds to P0

T=0 P1 6 T=0 T=1(R) T=15(R) 1 T=15(R) T=16 (R/W) 5 P0 P2 2 7 4 3

T=15(R) T=1(R) T=0 P0 P1 P2 Now what? (P0 wants all tokens)

8 T=15(R) T=1(R) T=0 P0 P1 P2 9 T=1 • P0 reissues request • P1 responds with a token Timeout!

T=0 P1 T=16 (R/W) T=0 P0 P2 • P0’s request completed

Token Coherence for CMPs: Enhancing Performance and Complexity Management

Token Coherence for CMPs: Enhancing Performance and Complexity Management

Presentation Transcript

Alternatives to Eager Hardware Cache Coherence on Large-Scale CMPs

Token Coherence: Decoupling Performance and Correctness

CMPS 211 JavaScript

Verifying Safety of a Token Coherence Implementation by Compositional Parametric Refinement

Coherence

Coherence

Coherence

Coherence

Token Coherence: Decoupling Performance and Correctness

Improving Multiple-CMP Systems with Token Coherence

Cache coherence for CMPs

Availability in CMPs

An Analytical Model for CMPs

Token Coherence

COHERENCE

CMPS 115 - Requirements

Policy Coherence for Development

Token Coherence

Implementing CMPs for DC Tie Load

Improving Multiple-CMP Systems with Token Coherence

Token Coherence: A Framework for Implementing Multiple-CMP Systems

Zwigato Token (Z Token)