1 / 17

# CS 7810 Lecture 19 - PowerPoint PPT Presentation

CS 7810 Lecture 19. Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004. Coherence / Consistency. Coherence guarantees (i) that a write will eventually be seen by other processors, and (ii) write

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'CS 7810 Lecture 19' - dick

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Coherence Decoupling: Making Use of Incoherence

J.Huh, J. Chang, D. Burger, G. Sohi

Proceedings of ASPLOS-XI

October 2004

• Coherence guarantees (i) that a write will

• eventually be seen by other processors, and (ii) write

• serialization (all processors see writes to the same location

• in the same order)

• The consistency model defines the ordering of writes and

• reads to different memory locations – the hardware

• guarantees a certain consistency model and the

• programmer attempts to write correct programs with

• those assumptions

Initially, A = B = 0

P1 P2

A = 1 B = 1

if (B == 0) if (A == 0)

critical section critical section

Initially, A = B = 0

P1 P2 P3

A = 1

if (A == 1)

B = 1

if (B == 1)

register = A

P1 P2

Data = 2000 while (Head == 0)

… = Data

• Caches share a bus; every cache sees each transaction

• in the same cycle; every cache manages itself

• When one cache writes to a block, every other cache

• invalidates its copy of that block

• When a cache has a read miss, the block is provided

• by memory or the last writer

• Protocols are defined by states: MSI, MESI, MOESI

Processor

Processor

Processor

Processor

Caches

Caches

Caches

Caches

Memory

• A directory keeps track of the sharing status of each block

• Every request goes to the directory and the directory then

• sends directives to each cache – the directory is the point

• of serialization (just as the bus is, in a snooping protocol)

• For example, on a write, the request reaches the directory,

• the directory sends invalidates to other sharers, and

• permissions are granted to the writer

Processor

Processor

Processor

Processor

Caches

Caches

Caches

Caches

Network

Memory

Directory

• A certain ordering of reads and writes is assumed – if that

• ordering is violated, the thread is re-executed

• The coherence protocol is used to propagate writes

Caches

Caches

Caches

Caches

Memory

• No thread is speculative – a parallel application with

• synchronization points and parallel regions and guaranteed

• to execute correctly with no need for re-execution

• Threads wait at synchronization points and wait for the

• correct permissions for every block of data

Caches

Caches

Caches

Caches

Memory

• A simple coherence protocol is often a slow

• protocol – for example, a simple protocol may not

• allow multiple outstanding requests

• Coherence decoupling: maintain a fast and

• incorrect protocol; and a slow and correct backing

• protocol; incurs fewer stalls in the common case

• and occasional recoveries

• A coherence operation is broken into two

• components: (i) acquiring and using the value,

• (ii) receiving the correct set of permissions

• Why does speculative cache look-up work?

• False sharing: a line was invalidated, but a

different word was written to

• Silent stores or value locality

• If there is spare bandwidth, updated values

can be pushed out to sharers

• The Miss Status Holding Register (MSHR) keeps

• track of outstanding requests – it can buffer the

• speculative value and ensure it matches the

• correct value – on a mis-speculation, that

• instruction is treated like a branch mis-predict

• Speculation on a coherence operation is no

• different from traditional forms of speculation

• Arguments for coherence decoupling:

• Reduces protocol complexity

• Reduces programming complexity

• Coherence misses will emerge as greater

bottlenecks?

• What is the expected trend for CMPs?

• Bullet