CS 162 Memory Consistency Models

CS 162Memory Consistency Models

Reordering in Uniprocessors • Memory operations are reordered to improve performance • Hardware (e.g., store buffer, reorder buffer) • Compiler (e.g., code motion, caching value in register) • Behave the same as long as dependences are respected ≡ a1: St x a2: Ld y a2: Ld y a1: St x

Reordering in Multiprocessors • counter-intuitive program behavior Possible outcomes Initially x=y=0 a1: x = 1; b1: Ry = y; b2: Rx = x; b1: Ry = y; P1P2 a2: y = 1; a1: x = 1; b1: Ry = y; b2: Rx = x; b1: Ry = y; b2: Rx = x; a1: x = 1; b1: Ry = y; a1: x = 1; b2: Rx = x; a2: y = 1; b2: Rx = x; a2: y = 1; a2: y = 1; Intuitively, y=1  x=1 (Rx=0, Ry =0) a1: x = 1; (Rx=0, Ry =1) (Rx=1, Ry =0) a2: y = 1; (Rx=1, Ry =1)

Reordering in Multiprocessors • counter-intuitive program behavior Initially p=NULL, flag = false P1P2 p = new A(…) if (flag) a = p->var; flag = true; flag is supposed to be set after p is allocated • Lock-free algorithms, e.g., Dekker, Peterson

Reordering in Multiprocessors • Dekker Algorithm (mutual exclusion) • counter-intuitive program behavior Initially flag1 = flag2 = 0 P1P2 flag1 = 1 flag1 = 1; flag2 = 1; if (flag2 == 0) if (flag1 == 0) critical sectioncritical section St flag1 flag2 == 0 Ld flag2 After reordering, both flag1 and flag2 can be 0

Memory Consistency Models • Specify the ordering of loads and stores to different memory locations • Ld  Ld, Ld  St, St  Ld, St  St • Contract between hardware, compiler, and programmer • hardware and compiler will not violate the ordering specified • the programmer will not assume a stricter order than that of the model

Memory Consistency Models Programmability Performance High Low Low High Easier to reason Fewer memory reorderings Stronger models Stronger constraints Lower performance

Cache Coherence vs. Memory Model • Cache coherence ensures a consistent view of memory • Guarantees that the update to memory by one processor will be seen by other processors eventually • But, how consistent ? • NO guarantees on when an update should be seen • NO guarantees on what order of updates should be seen

Cache Coherence vs. Memory Model Initially A = B = 0 P1 P2 P3 A = 1; while (A != 1) ; B = 1; while (B != 1) ; tmp = A ; tmp = 1? or tmp = 0?

P1 P2 P3 Pn MEMORY Sequential Consistency (SC) • Definition [Lamport] • (1) the result of any execution is the same as if the operations of all processors were executed in some sequential order; • (2) the operations of each individual processor appear in this sequence in the order specified by its program. • Behave as the repetition: • Pick a processor by any method (e.g., randomly) • the processor completes a load/store operation

SC Example P1P2 b1: Ry = y; b2: Rx = x; b1: Ry = y; b2: Rx = x; a1: x = 1; a2: y = 1; b1: Ry = y; b2: Rx = x; b1: Ry = y; b2: Rx = x; b1: Ry = y; a1: x = 1; a1: x = 1; a1: x = 1; b2: Rx = x; a2: y = 1; b1: Ry = y; b2: Rx = x; a2: y = 1; a2: y = 1; a2: y = 1; b1: Ry = y; b2: Rx = x; ≡ a1: x = 1; a2: y = 1; a2: y = 1; a1: x = 1; a1: x = 1; (Rx=0, Ry =0)

Sequential Consistency (SC) • However, the simplicity comes at the cost of performance • prevents aggressive compiler optimizations (e.g., load reordering, store reordering, caching value in register) • constrains hardware utilization, (e.g., store buffer) • Simple and intuitive • consistent with programmers’ intuition • easy to reason program behavior

SC Violation a1: x = 1 b1: R1 = y a2: y = 1 b2: R2 = x program order conflict relation SC Violation - A cycle formed by program orders and conflict orders [Shasha and Snir, 1988] e.g., (a2, b1, b2, a1, a2) - Executing in the order (a2, b1, b2, a1) will produce R1=1, R2=0, which is not an SC outcome Insert fences to break cycle - a2 can not be executed before a1

Fence Instructions • Fence Instructions • Order memory operations before and after the fence P1 p = new A(…) flag = true; • Inevitable -- building concurrent implementations (e.g., mutual exclusion, queues) [Attiya et. al., POPL’11] • Expensive -- Cilk-5’s THE protocol spends 50% of its time executing a memory fence [Frigo et. al., PLDI’98] FENCE

Conservativeness of Fences a1: St x a2: Ld y • At time T, a1 and a2 have completed; b1 and b2 only execute after time T. Fence1 Fence2 T • No cycle is formed at runtime b1: St y b2: Ld x • Inserted statically and conservatively

Conservativeness of Fences if (cond) a1: St x a2: Ld y a1: St *p a2: Ld x b1: St y b2: Ld x b1: St x b2: Ld *q Fence1 Fence2 Fence1 Fence2 • a1is in a conditional branch • p and q may point to the same memory location • No cycle is formed at runtime • Inserted statically and conservatively

Processor-centric Fence • Traditional fence • Processor-centric - unaware of memory accesses in other processors • However, purpose of fences • Prevent memory accesses from being reordered and observed by other processors (i.e., a cycle formed at runtime)

Address-aware Fences Consider memory locations accessed around fences at runtime Fences only take effect when there is a cycle about to happen

Detect and Avoid Cycles Proc 2 Proc 1 A1 B1 b1: … a1: … ? c2 Fence1 Fence2 a2: … b2: … c1 B2 A2 How to detect c2 efficiently?

Detect and Avoid Cycles Proc 2 Proc 1 A1 B1 b1: … a1: … Fence2 Fence1 a2: … b2: … c1 B2 A2 watchlist ? c2 • How to detect c2 efficiently? • Collecting watchlist for each fence • Completing memory operation checks the watchlist - bypass,if its address is not in the watchlist - stall, otherwise

Performance: Execution Time Traditional fence (T) vs. Address-aware fence (A) Fence overhead becomes negligible

Further Reading L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess program. IEEE Trans. Comput., 28(9):690–691, 1979. S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29:66–76, 1995. D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10(2):282–312, 1988. Daniel J. Sorin, Mark D. Hill, David A. Wood. A Primer on Memory Consistency and Cache Coherence. Synthesis Lectures on Computer Architecture, 2011. C. Lin, V. Nagarajan, and R. Gupta. Address-aware fences. ICS ’13, pages 313–324, 2013

CS 162 Memory Consistency Models

CS 162 Memory Consistency Models

Presentation Transcript

Shared Memory Consistency Models: A Tutorial

Memory Consistency Models

Lecture 4. Memory Consistency Models

CS 295 – Memory Models

Shared Memory Consistency Models: A Tutorial

CS 162 Section

Shared Memory Consistency Models: A Tutorial

Memory Consistency Models

Memory Consistency Models

Memory Consistency Models

Memory Consistency Models (III)

“Shared Memory Consistency Models: A Tutorial”

Memory Consistency

Memory consistency models

Shared Memory Consistency Models

Memory Consistency Models

Shared Memory Consistency Models: A Tutorial

Memory Consistency Models

Multiprocessors— Performance, Synchronization, Memory Consistency Models

Memory Consistency Models