1 / 23

Consistency

Consistency. Models of computation. Coherence vs. consistency. coherence deals with accesses to the same memory location consistency addresses the possible outcomes from legal orderings to all memory locations

carnig
Download Presentation

Consistency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Consistency Models of computation

  2. Coherence vs. consistency • coherence deals with accesses to the same memory location • consistency addresses the possible outcomes from legal orderings to all memory locations • common model (sequential consistency) is easy to understand but is difficult to implement, and has poor performance

  3. What do you expect? • Sequential consistency: “Commit results in processor order” • simple enough in a uniprocessor • similarly with context switching: just save and restore state • what about multi-threading, or multiprocessor machines?

  4. MIPS R10000 • issue instructions out of order • in-order commit • speculative loads may execute and pass a value for modification long before the load commits in program order • meantime, some other processor may commit a store to that location

  5. Producer - consumer P1 P2 write (A) ; while(flag != 1) ; flag := 1 ; read (A); • assumes P1’s writes become visible to P2 in program order

  6. One or both proceed P1 P2 X := 0 ; Y := 0 ; ... ... if (Y == 0) kill P2; if (X == 0) kill p1 ; • it’s a race through the critical section

  7. Sequential consistency • results can be mapped to some sequential execution where the instructions of each process appear in that program order equivalently: • memory operations proceed in program order • all writes are atomic and become visible to all processors at the same time

  8. The need to relax • strict sequential consistency has severe performance drawbacks, so: • keep sequential consistency, and use prefetch and speculation, or • relax the consistency model – and be prepared to think carefully about programs

  9. Attributes of consistency models • system specification • which orders are preserved, and which are not? is there system support to enforce a particular order? • programmer interface: the set of rules that will lead to the expected execution • translation mechanism: how to translate program annotations to hardware actions

  10. Alternative 1 • total store ordering: allows a read to bypass an earlier incomplete write • helps hide write latency • can be provided by fence instructions • SPARC v9 provides various memory barrier instructions

  11. Alternative 2 • partial store ordering: allow writes as well as reads to bypass writes • writes cannot bypass reads • writes are still atomic • very different from sequential consistency • e.g. spinning on a flag doesn’t work • needs a store barrier instruction to emulate sequential consistency

  12. Alternative 3 • processor consistency: same as total store ordering, but does not guarantee atomic writes • implemented in recent Intel processors

  13. Weak ordering • just try to preserve data and control dependencies within a process • don’t worry about the order of memory operations between synchronization points • e.g. don’t worry about the exact order of independent reads and writes within a critical section

  14. Weak ordering • code from outside (before or after) a critical section cannot be reordered with code inside it • code before a barrier must commit before entering, code after a barrier must not issue until the barrier is left • code before a flag wait must commit before waiting, and code after must not issue before flag is set by the producer • code before setting of a flag must commit first, and code after must not issue before the flag is set

  15. Weak ordering • a good match to modern CPUs and aggressive compiler optimizations • hardware must recognize synchronization, or compiler must insert proper barriers • MIPR R10000 provides sync instruction and fence count register • sync disables issue until fence register is zero and all outstanding memory operations have committed • fence count incremented on an L2 miss and decremented on a reply

  16. Release consistency • relax weak ordering further • categorize all synchronization operations as either acquire or release • acquire is a read (load) on a protected variable, like a lock or a waiting on a flag • release is a write (store) granting access to others, like unlock or setting a flag • barrier is release (arrival) and acquire (departure)

  17. In practice • MIPS processors are sequentially consistent • Sun supports total or partial store ordering • Intel supports processor consistency • Alpha and PowerPC support weak ordering; Power4 and Power5 do not guarantee atomic writes

  18. Processor consistency • a simple model with good performance • writes must become visible to all processors in program order • loads can bypass writes

  19. Back to our examples Under these rules, • does producer-consumer work? • does one-or-both work?

  20. Results under processor consistency • producer-consumer is okay because P1’s actions are both writes and they must become visible sequentially • one-or-both can break because loads can bypass writes • if (X == 0) is a load • Y = 0 is a write

  21. Intel Itanium • loads are not reordered with other loads • stores are not reordered with other stores • stores are not reordered with older loads • stores to the same location have a total order • a load may be reordered with an older store to a different location

  22. Itanium example 1 • initially, x=y=0 P1 P2 R1 <- x R2 <- y (loads) y <- 1 x <- 1 (stores) • we will never see R1 = R2 = 1 because stores are not reordered with older loads

  23. Itanium example 2 • initially, x=y=0 • P1 P2 • x <- 1 y <- 1 (stores) • R1 <- y R2 <- x (loads) • we may see R1 = R2 = 0 because loads may be reordered with older stores

More Related