1 / 22

Token Coherence for CMPs

Token Coherence for CMPs. Mike Marty CS 838 12/09/2003. Outline. Motivation for Token Coherence Token Coherence 101 Issues with CMPs Maintaining correctness When to retry? Persistent requests. Motivation #1: Performance Tradeoffs. Snooping Low latency (w/o network contention)

renate
Download Presentation

Token Coherence for CMPs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Token Coherence for CMPs Mike Marty CS 838 12/09/2003

  2. Outline • Motivation for Token Coherence • Token Coherence 101 • Issues with CMPs • Maintaining correctness • When to retry? • Persistent requests

  3. Motivation #1: Performance Tradeoffs • Snooping • Low latency (w/o network contention) • Bandwidth doesn’t scale • Requires ordered interconnect • Directory • No broadcast == scalability • Enables unordered interconnects • Indirections increase latency • Can we get best of both worlds?

  4. Motivation #2: Complexity • Even basic coherence protocols are hard to get right • Future sources of additional complexity • Prediction • e.g. Destination-Set Prediction • Hierarchy • CMPs • DNUCA

  5. P P P P P P Complexity Example Dir/Mem 2. GETX 2. GETX 3. Fwd 3. Inv S L2 L2 4. Inv 1. GETX 2. WB 1. GETX 5. DATA 4. Fwd 4. Inv S O S S

  6. Token Coherence • Global Invariant • For each block, allow either one writer or multiple readers • Enforce Locally • Each (logical) has T tokens (initially at memory) • Tokens never created nor destroyed • Components exchange tokens & data • All tokens to write <==> one writer invariant • At least one token to read <==> multiple readers Invariants enforced explicitly, rather than grab-bag of invalidations, acks, & nacks

  7. Starvation Avoidance • Purpose: handle pathological cases • Detect possible starvation • E.g. > 4 retries issued • Invoke Persistent Request • Starving processor issues • Fair arbitration scheme activates request • Request persists at each processor until satisfied • Deactivate upon completion • Persistent Requests should be rare

  8. Token Coherence Generalizes • Correctness Substrate • Safety with token counting • Starvation avoidance with persistent requests • Performance Protocol • Make the common case fast • Alternatives: • TokenB – broadcast • TokenM – multicast with prediction • TokenD – soft-state directory for a large system

  9. Token Coherence for CMPs • Ensuring safety with tokens • Tokens to individual caches • Exploiting CMP locality • Extra tokens to chip on initial requests • Scaling Persistent Requests

  10. 2-Level Directory TokenCMP … P P … P P Local Dir Local Dir Global Dir Local Dir Local Dir … … P P P P TokenCMP logically

  11. TokenCMP Reissues • Transient requests can fail • Coherence race • Network contention • When to reissue? • Reissue transient or persistent request?

  12. TokenCMP Persistent Requests • Recall: • Anything that holds/sends/receives tokens must remember all outstanding persistent requests • Scale persistent requests via hierarchy • One outstanding persistent request per chip • Achieve performance via locality • Satisfy all local persistent requests first

  13. Conclusion • CMPs move the complexity from the processors to the memory system • Token Coherence reduces complexity by separating performance from correctness • CMP performance largely depends on coherence • Token Coherence gives designers more performance tradeoffs

  14. Max Tokens Request to write Delayed T=0 T=0 T=16 (R/W) 1 P0 P1 P2 2 3 Delayed Request to read • P0 issues a request to write (delayed to P2) • P1 issues a request to read Token Coherence Example

  15. T=0 T=16 (R/W) P0 P2 T=0 P1 T=1(R) T=15(R) 1 2 4 T=1 3 • P2 responds with data to P1

  16. T=0 T=16 (R/W) P0 P2 T=0 5 P1 T=1(R) T=15(R) 1 2 4 3 • P0’s delayed request arrives at P2

  17. T=0 P1 6 T=15 T=0 T=1(R) T=15(R) 1 T=15(R) T=16 (R/W) 5 P0 P2 2 7 4 3 • P2 responds to P0

  18. T=0 P1 6 T=0 T=1(R) T=15(R) 1 T=15(R) T=16 (R/W) 5 P0 P2 2 7 4 3

  19. T=15(R) T=1(R) T=0 P0 P1 P2 Now what? (P0 wants all tokens)

  20. 8 T=15(R) T=1(R) T=0 P0 P1 P2 9 T=1 • P0 reissues request • P1 responds with a token Timeout!

  21. T=0 P1 T=16 (R/W) T=0 P0 P2 • P0’s request completed

More Related