slide1
Download
Skip this Video
Download Presentation
CS 7810 Lecture 19

Loading in 2 Seconds...

play fullscreen
1 / 17

CS 7810 Lecture 19 - PowerPoint PPT Presentation


  • 158 Views
  • Uploaded on

CS 7810 Lecture 19. Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004. Coherence / Consistency. Coherence guarantees (i) that a write will eventually be seen by other processors, and (ii) write

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CS 7810 Lecture 19' - dick


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

CS 7810 Lecture 19

Coherence Decoupling: Making Use of Incoherence

J.Huh, J. Chang, D. Burger, G. Sohi

Proceedings of ASPLOS-XI

October 2004

slide2

Coherence / Consistency

  • Coherence guarantees (i) that a write will
  • eventually be seen by other processors, and (ii) write
  • serialization (all processors see writes to the same location
  • in the same order)
  • The consistency model defines the ordering of writes and
  • reads to different memory locations – the hardware
  • guarantees a certain consistency model and the
  • programmer attempts to write correct programs with
  • those assumptions
slide3

Consistency Examples

Initially, A = B = 0

P1 P2

A = 1 B = 1

if (B == 0) if (A == 0)

critical section critical section

Initially, A = B = 0

P1 P2 P3

A = 1

if (A == 1)

B = 1

if (B == 1)

register = A

P1 P2

Data = 2000 while (Head == 0)

Head = 1 { }

… = Data

slide4

Snooping-Based Cache Coherence

  • Caches share a bus; every cache sees each transaction
  • in the same cycle; every cache manages itself
  • When one cache writes to a block, every other cache
  • invalidates its copy of that block
  • When a cache has a read miss, the block is provided
  • by memory or the last writer
  • Protocols are defined by states: MSI, MESI, MOESI

Processor

Processor

Processor

Processor

Caches

Caches

Caches

Caches

Memory

slide5

Directory-Based Cache Coherence

  • A directory keeps track of the sharing status of each block
  • Every request goes to the directory and the directory then
  • sends directives to each cache – the directory is the point
  • of serialization (just as the bus is, in a snooping protocol)
  • For example, on a write, the request reaches the directory,
  • the directory sends invalidates to other sharers, and
  • permissions are granted to the writer

Processor

Processor

Processor

Processor

Caches

Caches

Caches

Caches

Network

Memory

Directory

slide6

TLDS

  • A certain ordering of reads and writes is assumed – if that
  • ordering is violated, the thread is re-executed
  • The coherence protocol is used to propagate writes

Thread 1

Thread 2

Thread 3

Thread 4

Caches

Caches

Caches

Caches

Memory

slide7

The Traditional Model

  • No thread is speculative – a parallel application with
  • synchronization points and parallel regions and guaranteed
  • to execute correctly with no need for re-execution
  • Threads wait at synchronization points and wait for the
  • correct permissions for every block of data

Thread 1

Thread 2

Thread 3

Thread 4

Caches

Caches

Caches

Caches

Memory

slide8

Coherence Decoupling

  • A simple coherence protocol is often a slow
  • protocol – for example, a simple protocol may not
  • allow multiple outstanding requests
  • Coherence decoupling: maintain a fast and
  • incorrect protocol; and a slow and correct backing
  • protocol; incurs fewer stalls in the common case
  • and occasional recoveries
slide9

Coherence Decoupling

  • A coherence operation is broken into two
  • components: (i) acquiring and using the value,
  • (ii) receiving the correct set of permissions
slide10

SCL Protocol

  • Why does speculative cache look-up work?
    • False sharing: a line was invalidated, but a

different word was written to

    • Silent stores or value locality
    • If there is spare bandwidth, updated values

can be pushed out to sharers

slide11

Implementation

  • The Miss Status Holding Register (MSHR) keeps
  • track of outstanding requests – it can buffer the
  • speculative value and ensure it matches the
  • correct value – on a mis-speculation, that
  • instruction is treated like a branch mis-predict
  • Speculation on a coherence operation is no
  • different from traditional forms of speculation
slide16

Summary

  • Arguments for coherence decoupling:
    • Reduces protocol complexity
    • Reduces programming complexity
    • Marginal hardware overhead
    • Coherence misses will emerge as greater

bottlenecks?

  • What is the expected trend for CMPs?
slide17

Title

  • Bullet
ad