Loading in 5 sec....

CS 162 Memory Consistency ModelsPowerPoint Presentation

CS 162 Memory Consistency Models

- 118 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' CS 162 Memory Consistency Models' - yehudi

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### CS 162Memory Consistency Models

Reordering in Uniprocessors

- Memory operations are reordered to improve performance
- Hardware (e.g., store buffer, reorder buffer)
- Compiler (e.g., code motion, caching value in register)

- Behave the same as long as dependences are respected

≡

a1: St x

a2: Ld y

a2: Ld y

a1: St x

- counter-intuitive program behavior

Possible outcomes

Initially x=y=0

a1: x = 1;

b1: Ry = y;

b2: Rx = x;

b1: Ry = y;

P1P2

a2: y = 1;

a1: x = 1;

b1: Ry = y;

b2: Rx = x;

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

b1: Ry = y;

a1: x = 1;

b2: Rx = x;

a2: y = 1;

b2: Rx = x;

a2: y = 1;

a2: y = 1;

Intuitively, y=1 x=1

(Rx=0, Ry =0)

a1: x = 1;

(Rx=0, Ry =1)

(Rx=1, Ry =0)

a2: y = 1;

(Rx=1, Ry =1)

- counter-intuitive program behavior

Initially p=NULL, flag = false

P1P2

p = new A(…)

if (flag)

a = p->var;

flag = true;

flag is supposed to be set after p is allocated

- Lock-free algorithms, e.g., Dekker, Peterson

- Dekker Algorithm (mutual exclusion)

- counter-intuitive program behavior

Initially flag1 = flag2 = 0

P1P2

flag1 = 1

flag1 = 1; flag2 = 1;

if (flag2 == 0) if (flag1 == 0)

critical sectioncritical section

St flag1

flag2 == 0

Ld flag2

After reordering, both flag1 and flag2 can be 0

Memory Consistency Models

- Specify the ordering of loads and stores to different memory locations
- Ld Ld, Ld St, St Ld, St St

- Contract between hardware, compiler, and programmer
- hardware and compiler will not violate the ordering specified
- the programmer will not assume a stricter order than that of the model

Memory Consistency Models

Programmability

Performance

High

Low

Low

High

Easier to reason

Fewer

memory

reorderings

Stronger models

Stronger constraints

Lower performance

Cache Coherence vs. Memory Model

- Cache coherence ensures a consistent view of memory
- Guarantees that the update to memory by one processor will be seen by other processors eventually

- But, how consistent ?
- NO guarantees on when an update should be seen
- NO guarantees on what order of updates should be seen

Cache Coherence vs. Memory Model

Initially A = B = 0

P1 P2 P3

A = 1; while (A != 1) ;

B = 1; while (B != 1) ;

tmp = A ;

tmp = 1? or tmp = 0?

P2

P3

Pn

MEMORY

Sequential Consistency (SC)- Definition [Lamport]
- (1) the result of any execution is the same as if the operations of all processors were executed in some sequential order;
- (2) the operations of each individual processor appear in this sequence in the order specified by its program.

- Behave as the repetition:
- Pick a processor by any method (e.g., randomly)
- the processor completes a load/store operation

P1P2

b1: Ry = y;

b2: Rx = x;

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

b1: Ry = y;

b2: Rx = x;

b1: Ry = y;

b2: Rx = x;

b1: Ry = y;

a1: x = 1;

a1: x = 1;

a1: x = 1;

b2: Rx = x;

a2: y = 1;

b1: Ry = y;

b2: Rx = x;

a2: y = 1;

a2: y = 1;

a2: y = 1;

b1: Ry = y;

b2: Rx = x;

≡

a1: x = 1;

a2: y = 1;

a2: y = 1;

a1: x = 1;

a1: x = 1;

(Rx=0, Ry =0)

Sequential Consistency (SC)

- However, the simplicity comes at the cost of performance
- prevents aggressive compiler optimizations (e.g., load reordering, store reordering, caching value in register)
- constrains hardware utilization, (e.g., store buffer)

- Simple and intuitive
- consistent with programmers’ intuition
- easy to reason program behavior

SC Violation

a1: x = 1

b1: R1 = y

a2: y = 1

b2: R2 = x

program order

conflict relation

SC Violation

- A cycle formed by program orders and conflict orders

[Shasha and Snir, 1988]

e.g., (a2, b1, b2, a1, a2)

- Executing in the order (a2, b1, b2, a1) will produce R1=1, R2=0, which is not an SC outcome

Insert fences to break cycle

- a2 can not be executed before a1

- Fence Instructions
- Order memory operations before and after the fence

P1

p = new A(…)

flag = true;

- Inevitable -- building concurrent implementations (e.g., mutual exclusion, queues) [Attiya et. al., POPL’11]
- Expensive -- Cilk-5’s THE protocol spends 50% of its time executing a memory fence [Frigo et. al., PLDI’98]

FENCE

Conservativeness of Fences

a1: St x

a2: Ld y

- At time T, a1 and a2 have completed; b1 and b2 only execute after time T.

Fence1

Fence2

T

- No cycle is formed at runtime

b1: St y

b2: Ld x

- Inserted statically and conservatively

Conservativeness of Fences

if (cond)

a1: St x

a2: Ld y

a1: St *p

a2: Ld x

b1: St y

b2: Ld x

b1: St x

b2: Ld *q

Fence1

Fence2

Fence1

Fence2

- a1is in a conditional branch

- p and q may point to the same memory location

- No cycle is formed at runtime

- Inserted statically and conservatively

Processor-centric Fence

- Traditional fence
- Processor-centric - unaware of memory accesses in other processors

- However, purpose of fences
- Prevent memory accesses from being reordered and observed by other processors (i.e., a cycle formed at runtime)

Address-aware Fences

Consider memory locations accessed around fences at runtime

Fences only take effect when there is a cycle about to happen

Detect and Avoid Cycles

Proc 2

Proc 1

A1

B1

b1: …

a1: …

?

c2

Fence1

Fence2

a2: …

b2: …

c1

B2

A2

How to detect c2 efficiently?

Detect and Avoid Cycles

Proc 2

Proc 1

A1

B1

b1: …

a1: …

Fence2

Fence1

a2: …

b2: …

c1

B2

A2

watchlist

?

c2

- How to detect c2 efficiently?
- Collecting watchlist for each fence
- Completing memory operation checks the watchlist
- bypass,if its address is not in the watchlist

- stall, otherwise

Performance: Execution Time

Traditional fence (T) vs. Address-aware fence (A)

Fence overhead becomes negligible

Further Reading

L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess program. IEEE Trans. Comput., 28(9):690–691, 1979.

S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29:66–76, 1995.

D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10(2):282–312, 1988.

Daniel J. Sorin, Mark D. Hill, David A. Wood. A Primer on Memory Consistency and Cache Coherence. Synthesis Lectures on Computer Architecture, 2011.

C. Lin, V. Nagarajan, and R. Gupta. Address-aware fences. ICS ’13, pages 313–324, 2013

Download Presentation

Connecting to Server..