1 / 24

Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors

Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors. http://iacoma.cs.uiuc.edu. Motivation. CMPs are ubiquitous. Shared memory + caches = cache coherence. Traditional cache coherence solutions. shared bus-based: electrical, layout issues.

maree
Download Presentation

Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Uncorq:Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors http://iacoma.cs.uiuc.edu

  2. Motivation • CMPs are ubiquitous • Shared memory + caches = cache coherence • Traditional cache coherence solutions • shared bus-based: electrical, layout issues • directory-based: indirection, storage

  3. Embedded-ring cache coherence [ISCA 2006] • Novel snoopy cache coherence for mid-sized machines • logical ring is embedded in network • control messages use ring • data messages use any path • Simple and inexpensive to implement • Snoop requests can have long latencies

  4. Contributions • Propose invariant for transaction serialization • Propose performance enhancements • Uncorq: unconstrained snoop request delivery • reduces cache-to-cache transfer latency • Simple hardware data prefetching technique • reduces memory-to-cache transfer latency

  5. data A B snoop op. outcome logical ring request request request response response + positive response positive snoop op. outcome Embedded-ring terminology • Snoopy, invalidate protocol • Single supplier protocol • Types of messages: • snoop request • snoop response control messages • snoop request + response • data +

  6. Ordering invariant

  7. A B S S I S inv time ack read inv data inv ack Transaction serialization S I I S M old value new value

  8. A request request request request response response response response Serialization enforcement with embedded-ring • Logical unidirectional ring provides partial ordering • Distributed algorithm establishes global order • for same-address transactions • On simultaneous transactions to same address: • one is declared the “winner” (firstto reach supplier) • others have to retry

  9. A’s request and response B’s request and response How to serialize transactions + No clear “first” transaction A B’s request reaches S first B Ring guarantees responses are forwarded in the order S performed snoop operations S + A receives B’s positive response before its own A retries: B  A

  10. S + request + response Enforcing transaction serialization • Node whose request arrives at supplier node first is the “winner” • What we need to enforce transaction serialization: Ordering Invariant: the order in which responses travel the ring after leaving the supplier must be the same as the order in which the supplier processed their corresponding requests. loser node sees other node’s positive response before its own

  11. Uncorq:Unconstrained snooprequest delivery

  12. Uncorq request response Uncorq idea Baseline Idea: requests do not have to follow the ring (but responses do)

  13. savings request reaches supplier node Benefit of Uncorq Reduced cache-to-cache transfer latency time request snoop data Baseline Uncorq

  14. Implications of Uncorq • Uncorq no longer restricts order of requests • Nodes may receive and process requests in any order • Responses may also get reordered Problem: distributed algorithm relies on the fact that response order reflects order of requests at supplier

  15. Ordering invariant A S B + + S S request + response Example: incorrect transaction ordering A node cannot forward any other response if it has an outstanding positive snoop outcome

  16. + request response How Uncorq stalls responses • Local transaction table (per-node structure) • records messages that node is currently processing + + + A B C … requests   addr C responses  

  17. R R Optimization: prefetching from memory • Goal: reduce latency of memory-to-cache transfers • Access memory in parallel with ring snoop optimized unoptimized (1) (2) (1) (1) memory memory • Predict when no node will supply data

  18. Evaluation

  19. Experimental setup • 64 nodes in a single CMP • Interconnection network: 2D torus with embedded-ring • SESC simulator (sesc.sourceforge.net) • SPLASH-2, SPECjbb and SPECweb workloads

  20. Baseline 10 100 8 80 distribution (%) cumulative distribution (%) 6 60 4 40 2 20 0 0 0 100 200 300 400 500 600 cache-to-cache transfer latency substantial reduction in latency Uncorq 10 100 8 80 6 60 cumulative distribution (%) distribution (%) 4 40 2 20 0 0 0 100 200 300 400 500 600 cache-to-cache transfer latency Cache-to-cache transfer latency

  21. Execution Time 1 0.9 0.8  0.7 Baseline normalized execution time 0.6 Uncorq 0.5 Uncorq+Pref 0.4 0.3  0.2 0.1 0 SPLASH-2 SPECjbb SPECweb • Uncorq significantly reduces execution time (reduction: 5-23%) • Uncorq + Pref performs the best (reduction: 13-26%)

  22. Also in the paper • Serialization mechanism for case with no supplier • System and node forward progress • Fences and memory consistency issues • Characterization of prefetching mechanism • Comparison against ccHyperTransport

  23. Conclusion • Propose invariant for transaction serialization • Propose performance enhancements • Uncorq: unconstrained snoop request delivery • Simple hardware data prefetching technique • Reduce execution time by 13-26%

  24. Uncorq:Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors http://iacoma.cs.uiuc.edu

More Related