Token Coherence for CMPs: Enhancing Performance and Complexity Management
230 likes | 324 Views
Explore token coherence for CMPs, addressing performance tradeoffs and complexity challenges. Understand the benefits of token-based protocols in maintaining correctness and avoiding starvation. Learn how token coherence generalizes coherence solutions. Delve into examples and strategies for scaling persistent requests in CMP systems.
Token Coherence for CMPs: Enhancing Performance and Complexity Management
E N D
Presentation Transcript
Token Coherence for CMPs Mike Marty CS 838 12/09/2003
Outline • Motivation for Token Coherence • Token Coherence 101 • Issues with CMPs • Maintaining correctness • When to retry? • Persistent requests
Motivation #1: Performance Tradeoffs • Snooping • Low latency (w/o network contention) • Bandwidth doesn’t scale • Requires ordered interconnect • Directory • No broadcast == scalability • Enables unordered interconnects • Indirections increase latency • Can we get best of both worlds?
Motivation #2: Complexity • Even basic coherence protocols are hard to get right • Future sources of additional complexity • Prediction • e.g. Destination-Set Prediction • Hierarchy • CMPs • DNUCA
P P P P P P Complexity Example Dir/Mem 2. GETX 2. GETX 3. Fwd 3. Inv S L2 L2 4. Inv 1. GETX 2. WB 1. GETX 5. DATA 4. Fwd 4. Inv S O S S
Token Coherence • Global Invariant • For each block, allow either one writer or multiple readers • Enforce Locally • Each (logical) has T tokens (initially at memory) • Tokens never created nor destroyed • Components exchange tokens & data • All tokens to write <==> one writer invariant • At least one token to read <==> multiple readers Invariants enforced explicitly, rather than grab-bag of invalidations, acks, & nacks
Starvation Avoidance • Purpose: handle pathological cases • Detect possible starvation • E.g. > 4 retries issued • Invoke Persistent Request • Starving processor issues • Fair arbitration scheme activates request • Request persists at each processor until satisfied • Deactivate upon completion • Persistent Requests should be rare
Token Coherence Generalizes • Correctness Substrate • Safety with token counting • Starvation avoidance with persistent requests • Performance Protocol • Make the common case fast • Alternatives: • TokenB – broadcast • TokenM – multicast with prediction • TokenD – soft-state directory for a large system
Token Coherence for CMPs • Ensuring safety with tokens • Tokens to individual caches • Exploiting CMP locality • Extra tokens to chip on initial requests • Scaling Persistent Requests
2-Level Directory TokenCMP … P P … P P Local Dir Local Dir Global Dir Local Dir Local Dir … … P P P P TokenCMP logically
TokenCMP Reissues • Transient requests can fail • Coherence race • Network contention • When to reissue? • Reissue transient or persistent request?
TokenCMP Persistent Requests • Recall: • Anything that holds/sends/receives tokens must remember all outstanding persistent requests • Scale persistent requests via hierarchy • One outstanding persistent request per chip • Achieve performance via locality • Satisfy all local persistent requests first
Conclusion • CMPs move the complexity from the processors to the memory system • Token Coherence reduces complexity by separating performance from correctness • CMP performance largely depends on coherence • Token Coherence gives designers more performance tradeoffs
Max Tokens Request to write Delayed T=0 T=0 T=16 (R/W) 1 P0 P1 P2 2 3 Delayed Request to read • P0 issues a request to write (delayed to P2) • P1 issues a request to read Token Coherence Example
T=0 T=16 (R/W) P0 P2 T=0 P1 T=1(R) T=15(R) 1 2 4 T=1 3 • P2 responds with data to P1
T=0 T=16 (R/W) P0 P2 T=0 5 P1 T=1(R) T=15(R) 1 2 4 3 • P0’s delayed request arrives at P2
T=0 P1 6 T=15 T=0 T=1(R) T=15(R) 1 T=15(R) T=16 (R/W) 5 P0 P2 2 7 4 3 • P2 responds to P0
T=0 P1 6 T=0 T=1(R) T=15(R) 1 T=15(R) T=16 (R/W) 5 P0 P2 2 7 4 3
T=15(R) T=1(R) T=0 P0 P1 P2 Now what? (P0 wants all tokens)
8 T=15(R) T=1(R) T=0 P0 P1 P2 9 T=1 • P0 reissues request • P1 responds with a token Timeout!
T=0 P1 T=16 (R/W) T=0 P0 P2 • P0’s request completed