CS 7810    Lecture 19
Download
1 / 17

CS 7810 Lecture 19 - PowerPoint PPT Presentation


  • 158 Views
  • Uploaded on

CS 7810 Lecture 19. Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004. Coherence / Consistency. Coherence guarantees (i) that a write will eventually be seen by other processors, and (ii) write

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CS 7810 Lecture 19' - dick


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

CS 7810 Lecture 19

Coherence Decoupling: Making Use of Incoherence

J.Huh, J. Chang, D. Burger, G. Sohi

Proceedings of ASPLOS-XI

October 2004


Coherence / Consistency

  • Coherence guarantees (i) that a write will

  • eventually be seen by other processors, and (ii) write

  • serialization (all processors see writes to the same location

  • in the same order)

  • The consistency model defines the ordering of writes and

  • reads to different memory locations – the hardware

  • guarantees a certain consistency model and the

  • programmer attempts to write correct programs with

  • those assumptions


Consistency Examples

Initially, A = B = 0

P1 P2

A = 1 B = 1

if (B == 0) if (A == 0)

critical section critical section

Initially, A = B = 0

P1 P2 P3

A = 1

if (A == 1)

B = 1

if (B == 1)

register = A

P1 P2

Data = 2000 while (Head == 0)

Head = 1 { }

… = Data


Snooping-Based Cache Coherence

  • Caches share a bus; every cache sees each transaction

  • in the same cycle; every cache manages itself

  • When one cache writes to a block, every other cache

  • invalidates its copy of that block

  • When a cache has a read miss, the block is provided

  • by memory or the last writer

  • Protocols are defined by states: MSI, MESI, MOESI

Processor

Processor

Processor

Processor

Caches

Caches

Caches

Caches

Memory


Directory-Based Cache Coherence

  • A directory keeps track of the sharing status of each block

  • Every request goes to the directory and the directory then

  • sends directives to each cache – the directory is the point

  • of serialization (just as the bus is, in a snooping protocol)

  • For example, on a write, the request reaches the directory,

  • the directory sends invalidates to other sharers, and

  • permissions are granted to the writer

Processor

Processor

Processor

Processor

Caches

Caches

Caches

Caches

Network

Memory

Directory


TLDS

  • A certain ordering of reads and writes is assumed – if that

  • ordering is violated, the thread is re-executed

  • The coherence protocol is used to propagate writes

Thread 1

Thread 2

Thread 3

Thread 4

Caches

Caches

Caches

Caches

Memory


The Traditional Model

  • No thread is speculative – a parallel application with

  • synchronization points and parallel regions and guaranteed

  • to execute correctly with no need for re-execution

  • Threads wait at synchronization points and wait for the

  • correct permissions for every block of data

Thread 1

Thread 2

Thread 3

Thread 4

Caches

Caches

Caches

Caches

Memory


Coherence Decoupling

  • A simple coherence protocol is often a slow

  • protocol – for example, a simple protocol may not

  • allow multiple outstanding requests

  • Coherence decoupling: maintain a fast and

  • incorrect protocol; and a slow and correct backing

  • protocol; incurs fewer stalls in the common case

  • and occasional recoveries


Coherence Decoupling

  • A coherence operation is broken into two

  • components: (i) acquiring and using the value,

  • (ii) receiving the correct set of permissions


SCL Protocol

  • Why does speculative cache look-up work?

    • False sharing: a line was invalidated, but a

      different word was written to

    • Silent stores or value locality

    • If there is spare bandwidth, updated values

      can be pushed out to sharers


Implementation

  • The Miss Status Holding Register (MSHR) keeps

  • track of outstanding requests – it can buffer the

  • speculative value and ensure it matches the

  • correct value – on a mis-speculation, that

  • instruction is treated like a branch mis-predict

  • Speculation on a coherence operation is no

  • different from traditional forms of speculation






Summary

  • Arguments for coherence decoupling:

    • Reduces protocol complexity

    • Reduces programming complexity

    • Marginal hardware overhead

    • Coherence misses will emerge as greater

      bottlenecks?

  • What is the expected trend for CMPs?


Title

  • Bullet


ad