CS 7810 Lecture 19

1 / 17

# CS 7810 Lecture 19 - PowerPoint PPT Presentation

CS 7810 Lecture 19. Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004. Coherence / Consistency. Coherence guarantees (i) that a write will eventually be seen by other processors, and (ii) write

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'CS 7810 Lecture 19' - dick

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

CS 7810 Lecture 19

Coherence Decoupling: Making Use of Incoherence

J.Huh, J. Chang, D. Burger, G. Sohi

Proceedings of ASPLOS-XI

October 2004

Coherence / Consistency

• Coherence guarantees (i) that a write will
• eventually be seen by other processors, and (ii) write
• serialization (all processors see writes to the same location
• in the same order)
• The consistency model defines the ordering of writes and
• reads to different memory locations – the hardware
• guarantees a certain consistency model and the
• programmer attempts to write correct programs with
• those assumptions

Consistency Examples

Initially, A = B = 0

P1 P2

A = 1 B = 1

if (B == 0) if (A == 0)

critical section critical section

Initially, A = B = 0

P1 P2 P3

A = 1

if (A == 1)

B = 1

if (B == 1)

register = A

P1 P2

Data = 2000 while (Head == 0)

… = Data

Snooping-Based Cache Coherence

• Caches share a bus; every cache sees each transaction
• in the same cycle; every cache manages itself
• When one cache writes to a block, every other cache
• invalidates its copy of that block
• When a cache has a read miss, the block is provided
• by memory or the last writer
• Protocols are defined by states: MSI, MESI, MOESI

Processor

Processor

Processor

Processor

Caches

Caches

Caches

Caches

Memory

Directory-Based Cache Coherence

• A directory keeps track of the sharing status of each block
• Every request goes to the directory and the directory then
• sends directives to each cache – the directory is the point
• of serialization (just as the bus is, in a snooping protocol)
• For example, on a write, the request reaches the directory,
• the directory sends invalidates to other sharers, and
• permissions are granted to the writer

Processor

Processor

Processor

Processor

Caches

Caches

Caches

Caches

Network

Memory

Directory

TLDS

• A certain ordering of reads and writes is assumed – if that
• ordering is violated, the thread is re-executed
• The coherence protocol is used to propagate writes

Caches

Caches

Caches

Caches

Memory

• No thread is speculative – a parallel application with
• synchronization points and parallel regions and guaranteed
• to execute correctly with no need for re-execution
• Threads wait at synchronization points and wait for the
• correct permissions for every block of data

Caches

Caches

Caches

Caches

Memory

Coherence Decoupling

• A simple coherence protocol is often a slow
• protocol – for example, a simple protocol may not
• allow multiple outstanding requests
• Coherence decoupling: maintain a fast and
• incorrect protocol; and a slow and correct backing
• protocol; incurs fewer stalls in the common case
• and occasional recoveries

Coherence Decoupling

• A coherence operation is broken into two
• components: (i) acquiring and using the value,
• (ii) receiving the correct set of permissions

SCL Protocol

• Why does speculative cache look-up work?
• False sharing: a line was invalidated, but a

different word was written to

• Silent stores or value locality
• If there is spare bandwidth, updated values

can be pushed out to sharers

Implementation

• The Miss Status Holding Register (MSHR) keeps
• track of outstanding requests – it can buffer the
• speculative value and ensure it matches the
• correct value – on a mis-speculation, that
• instruction is treated like a branch mis-predict
• Speculation on a coherence operation is no
• different from traditional forms of speculation

Summary

• Arguments for coherence decoupling:
• Reduces protocol complexity
• Reduces programming complexity