Cs 162 memory consistency models
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

CS 162 Memory Consistency Models PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on
  • Presentation posted in: General

CS 162 Memory Consistency Models. Reordering in Uniprocessors. Memory operations are reordered to improve performance Hardware ( e.g. , store buffer, reorder buffer) Compiler ( e.g. , code motion, caching value in register) Behave the same as long as dependences are respected. ≡. a1: St x

Download Presentation

CS 162 Memory Consistency Models

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cs 162 memory consistency models

CS 162Memory Consistency Models


Reordering in uniprocessors

Reordering in Uniprocessors

  • Memory operations are reordered to improve performance

    • Hardware (e.g., store buffer, reorder buffer)

    • Compiler (e.g., code motion, caching value in register)

  • Behave the same as long as dependences are respected

a1: St x

a2: Ld y

a2: Ld y

a1: St x


Cs 162 memory consistency models

Reordering in Multiprocessors

  • counter-intuitive program behavior

Possible outcomes

Initially x=y=0

a1: x = 1;

b1: Ry = y;

b2: Rx = x;

b1: Ry = y;

P1P2

a2: y = 1;

a1: x = 1;

b1: Ry = y;

b2: Rx = x;

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

b1: Ry = y;

a1: x = 1;

b2: Rx = x;

a2: y = 1;

b2: Rx = x;

a2: y = 1;

a2: y = 1;

Intuitively, y=1  x=1

(Rx=0, Ry =0)

a1: x = 1;

(Rx=0, Ry =1)

(Rx=1, Ry =0)

a2: y = 1;

(Rx=1, Ry =1)


Cs 162 memory consistency models

Reordering in Multiprocessors

  • counter-intuitive program behavior

Initially p=NULL, flag = false

P1P2

p = new A(…)

if (flag)

a = p->var;

flag = true;

flag is supposed to be set after p is allocated

  • Lock-free algorithms, e.g., Dekker, Peterson


Cs 162 memory consistency models

Reordering in Multiprocessors

  • Dekker Algorithm (mutual exclusion)

  • counter-intuitive program behavior

Initially flag1 = flag2 = 0

P1P2

flag1 = 1

flag1 = 1; flag2 = 1;

if (flag2 == 0) if (flag1 == 0)

critical sectioncritical section

St flag1

flag2 == 0

Ld flag2

After reordering, both flag1 and flag2 can be 0


Memory consistency models

Memory Consistency Models

  • Specify the ordering of loads and stores to different memory locations

    • Ld  Ld, Ld  St, St  Ld, St  St

  • Contract between hardware, compiler, and programmer

    • hardware and compiler will not violate the ordering specified

    • the programmer will not assume a stricter order than that of the model


Memory consistency models1

Memory Consistency Models

Programmability

Performance

High

Low

Low

High

Easier to reason

Fewer

memory

reorderings

Stronger models

Stronger constraints

Lower performance


Cache coherence vs memory model

Cache Coherence vs. Memory Model

  • Cache coherence ensures a consistent view of memory

    • Guarantees that the update to memory by one processor will be seen by other processors eventually

  • But, how consistent ?

    • NO guarantees on when an update should be seen

    • NO guarantees on what order of updates should be seen


Cache coherence vs memory model1

Cache Coherence vs. Memory Model

Initially A = B = 0

P1 P2 P3

A = 1; while (A != 1) ;

B = 1; while (B != 1) ;

tmp = A ;

tmp = 1? or tmp = 0?


Sequential consistency sc

P1

P2

P3

Pn

MEMORY

Sequential Consistency (SC)

  • Definition [Lamport]

    • (1) the result of any execution is the same as if the operations of all processors were executed in some sequential order;

    • (2) the operations of each individual processor appear in this sequence in the order specified by its program.

  • Behave as the repetition:

  • Pick a processor by any method (e.g., randomly)

  • the processor completes a load/store operation


Cs 162 memory consistency models

SC Example

P1P2

b1: Ry = y;

b2: Rx = x;

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

b1: Ry = y;

b2: Rx = x;

b1: Ry = y;

b2: Rx = x;

b1: Ry = y;

a1: x = 1;

a1: x = 1;

a1: x = 1;

b2: Rx = x;

a2: y = 1;

b1: Ry = y;

b2: Rx = x;

a2: y = 1;

a2: y = 1;

a2: y = 1;

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

a2: y = 1;

a1: x = 1;

a1: x = 1;

(Rx=0, Ry =0)


Sequential consistency sc1

Sequential Consistency (SC)

  • However, the simplicity comes at the cost of performance

    • prevents aggressive compiler optimizations (e.g., load reordering, store reordering, caching value in register)

    • constrains hardware utilization, (e.g., store buffer)

  • Simple and intuitive

    • consistent with programmers’ intuition

    • easy to reason program behavior


Sc violation

SC Violation

a1: x = 1

b1: R1 = y

a2: y = 1

b2: R2 = x

program order

conflict relation

SC Violation

- A cycle formed by program orders and conflict orders

[Shasha and Snir, 1988]

e.g., (a2, b1, b2, a1, a2)

- Executing in the order (a2, b1, b2, a1) will produce R1=1, R2=0, which is not an SC outcome

Insert fences to break cycle

- a2 can not be executed before a1


Cs 162 memory consistency models

Fence Instructions

  • Fence Instructions

  • Order memory operations before and after the fence

P1

p = new A(…)

flag = true;

  • Inevitable -- building concurrent implementations (e.g., mutual exclusion, queues) [Attiya et. al., POPL’11]

  • Expensive -- Cilk-5’s THE protocol spends 50% of its time executing a memory fence [Frigo et. al., PLDI’98]

FENCE


Conservativeness of fences

Conservativeness of Fences

a1: St x

a2: Ld y

  • At time T, a1 and a2 have completed; b1 and b2 only execute after time T.

Fence1

Fence2

T

  • No cycle is formed at runtime

b1: St y

b2: Ld x

  • Inserted statically and conservatively


Conservativeness of fences1

Conservativeness of Fences

if (cond)

a1: St x

a2: Ld y

a1: St *p

a2: Ld x

b1: St y

b2: Ld x

b1: St x

b2: Ld *q

Fence1

Fence2

Fence1

Fence2

  • a1is in a conditional branch

  • p and q may point to the same memory location

  • No cycle is formed at runtime

  • Inserted statically and conservatively


Processor centric fence

Processor-centric Fence

  • Traditional fence

    • Processor-centric - unaware of memory accesses in other processors

  • However, purpose of fences

    • Prevent memory accesses from being reordered and observed by other processors (i.e., a cycle formed at runtime)


Address aware fences

Address-aware Fences

Consider memory locations accessed around fences at runtime

Fences only take effect when there is a cycle about to happen


Detect and avoid cycles

Detect and Avoid Cycles

Proc 2

Proc 1

A1

B1

b1: …

a1: …

?

c2

Fence1

Fence2

a2: …

b2: …

c1

B2

A2

How to detect c2 efficiently?


Detect and avoid cycles1

Detect and Avoid Cycles

Proc 2

Proc 1

A1

B1

b1: …

a1: …

Fence2

Fence1

a2: …

b2: …

c1

B2

A2

watchlist

?

c2

  • How to detect c2 efficiently?

    • Collecting watchlist for each fence

    • Completing memory operation checks the watchlist

      - bypass,if its address is not in the watchlist

      - stall, otherwise


Performance execution time

Performance: Execution Time

Traditional fence (T) vs. Address-aware fence (A)

Fence overhead becomes negligible


Further reading

Further Reading

L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess program. IEEE Trans. Comput., 28(9):690–691, 1979.

S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29:66–76, 1995.

D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10(2):282–312, 1988.

Daniel J. Sorin, Mark D. Hill, David A. Wood. A Primer on Memory Consistency and Cache Coherence. Synthesis Lectures on Computer Architecture, 2011.

C. Lin, V. Nagarajan, and R. Gupta. Address-aware fences. ICS ’13, pages 313–324, 2013


  • Login