logical protocol to physical design l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Logical Protocol to Physical Design PowerPoint Presentation
Download Presentation
Logical Protocol to Physical Design

Loading in 2 Seconds...

play fullscreen
1 / 19

Logical Protocol to Physical Design - PowerPoint PPT Presentation


  • 157 Views
  • Uploaded on

Logical Protocol to Physical Design. Correctness: a cache-coherent memory system must satisfy the requirements of coherence. It should ensure that stale copies are found and invalidated or updated on writes, and it should provide write serialization.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Logical Protocol to Physical Design' - latika


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
logical protocol to physical design

Logical Protocol to Physical Design

  • Correctness: a cache-coherent memory system must satisfy the requirements of coherence. It should ensure that stale copies are found and invalidated or updated on writes, and it should provide write serialization.
  • complications: Actions considered atomic at abstract level are not necessarily at hardware level.

Implementation must contend with

Correctness

High Performance

Minimal extra hardware

logical to physical design
Logical to Physical Design
  • Performance issues:
  • We want to pipeline memory operations and allow many operations to be outstanding at the same time rather than waiting for each operation to complete before starting another one.. This is done for the sake of performance!
  • However, correctness may be compromised ! Out-of-order execution, speculation complicate this issue.
logical protocol algorithm
Logical Protocol Algorithm
  • Set of States
  • Events causing state transitions
  • Actions on Transition
  • While actions like PrWr/BusRdx seem atomic in reality they may not be in hardware!
reality

$ Ctrl

S

Tag

Data

Proc

Reality
  • Protocol defines logical FSM for each block.
  • Cache controller FSM
  • Bus controller FSM
  • Other $Ctrls Get bus
  • Multiple Bus trnxs
  • Multi-Level Caches
  • Split-Transaction Busses

Bus Controller

Cache

controller

Cache

Figure1: a uniprocessor cache design

Let’s first see an atomic bus and a single-level cache design !!

typical bus protocol
Typical Bus Protocol
  • Bus state machine
    • Assert request for bus
    • Wait for bus grant
    • Drive address and command lines
    • Wait for command to be accepted by relevant device
    • Transfer data

BG

BReq

BR

BGnt

Addr

BG

OK

Addr

OK

Data

others

may get

bus

OK

OK

Data

correctness issues
Correctness Issues
  • Fulfill conditions for coherence and consistency
    • write propagation and atomicity
    • Write serialization

Free of :

  • Deadlock: all system activity ceases
    • Cycle of resource dependences
  • Livelock: no processor makes forward progress although transactions are performed at hardware level
    • e.g. simultaneous writes in invalidation-based protocol
      • each requests ownership, invalidating other, but loses it before winning arbitration for the bus
  • Starvation: one or more processors make no forward progress while others do.
    • Often not completely eliminated (not likely, not catastrophic)

(a)

(b)

preliminary design issues
Preliminary Design Issues
  • How should we design of cache controller and tags given that both processor and bus controller need access to the tags for look up?
  • How and when to present snoop results on bus? Recall that snoop results are presented as bus transactions
  • Dealing with write-backs
  • Even though the bus is atomic, the overall set of actions to satisfy a memory operation uses other resources (such as cache controllers) and is not atomic: this can introduce race conditions
  • New issues deadlock, livelock, starvation, serialization, etc.
contention for cache tags see fig1
Contention for Cache Tags (See Fig1)
  • Cache controller must monitor bus and processor
    • Can view as two controllers: bus-side, and processor-side
    • With single-level cache: dual tags (not data) or dual-ported tag RAM
      • must reconcile when updated, but usually only looked up
    • Respond to bus transactions

Tags and state used by processor

Instead of

data

tags

Tags and state used by Bus Snooper

When a state or a tag for a block is updated

Both tags need to be updated

snoop results
Snoop Results
  • Snooping introduces a new issue to bus transaction. For snooping caches, each cache must check the address against its tags, and the collective result of the snoop from all caches must be reported on the bus before the transaction can proceed.
  • One particular function of the snoop result is to inform memory whether it should respond to the request or whether some cache is holding a modified copy of the block so an alternative action is necessary. The questions are
    • When is the snoop result reported on the bus? and
    • In what form (how)?
reporting snoop results how in what form should snoop responses be reported on the bus
Reporting Snoop Results: How?(in what form should snoop responses be reported on the bus?)
  • Collective response from $’s must appear on bus
  • Example: in MESI protocol, need to know
    • Is block dirty; i.e. should memory respond or not?
    • Is block shared; i.e. transition to E or S state on read miss?
  • Three wired-OR signals
    • Shared: asserted if any cache has a copy
    • Dirty: asserted if some cache has a dirty copy (modified copy)
      • needn’t know which, since it will do what’s necessary
    • Snoop-valid: asserted when OK to check other two signals
      • actually inhibit until OK to check
  • Illinois MESI requires priority scheme for cache-to-cache transfers
    • Which cache should supply data when in shared state?
    • Commercial implementations allow memory to provide data
reporting snoop results how
Reporting Snoop Results: How?
  • In case of MESI protocol
    • The requesting cache controller needs to know whether the requested memory block is in other processor’s cache so that it can decide whether to load the block in E or S state.
    • In addition, the memory needs to know whether any cache has the block in modified state (M), in which case the memory need not respond.
  • Solutions
    • Refer slide 8

11

reporting snoop results when 3 options
Reporting Snoop Results: When?(3 options)
  • Memory needs to know what, if anything, to do
  • Fixed number of clocks from address appearing on bus
    • Dual tags required to reduce contention with processor
    • Still must be conservative (update both tags on write: E -> M) (assume worst-case delay)
    • Pentium Pro, HP servers, Sun Enterprise
  • Variable delay
    • Memory assumes cache will supply data till all say “sorry”
    • Less conservative, more flexible, more complex
    • Memory can fetch data and hold just in case (SGI Challenge)
  • Immediately: Bit-per-block in memory
    • Extra hardware complexity in commodity main memory system
    • (Extra bit per block in memory indicating whether this block is modified in one of caches or not). If modified, memory does not respond, if not it responds. Complexity extra hardware
writebacks
Writebacks:
  • Complicate design because they involve two blocks and hence 2 bus transactions!! An incoming block, and an outgoing (modified block) that is being replaced.
  • To allow processor to continue working on a cache miss that causes a write back, we would like to delay the write back and instead first service the miss that caused it. This requires two additional requirements:
  • a) Need for write back buffers ! Will be used to temporarily store the block being replaced while the new block is being brought into the cache and before the bus can be reacquired for a second transaction to complete the write back.
  • b) Must handle bus transactions relevant to buffered block
    • snoop the WB buffer
    • which requires an address comparator be added to snoop on write-back buffer!

13

basic design
Basic design

Processor initiates transaction by placing address and command on the bus

  • Single level WB cache
  • Bus atomic
  • Bus arbitration not shown
  • Some other coordination signals not shown

Bus-side controller snoops on the cache tags and WB tags

Wired-OR signal: serve as acknowledgement to the initiator that all caches have seen the request and taken appropriate action

14

non atomic state transitions
Non-Atomic State Transitions
  • Memory operation involves many actions by many entities, incl. bus
    • Look up cache tags, bus arbitration, actions by other controllers, ...
    • Even if bus is atomic, overall set of actions is not
    • Can have race conditions among components of different operations

Suppose P1 and P2 attempt to write cached block A simultaneously

    • Each decides to issue BusUpgr to allow S –> M

See example next slide.

  • Issues
    • Must handle requests for other blocks while waiting to acquire bus
    • Must handle requests for this block A
      • e.g. if P2 wins, P1 must invalidate copy and modify request to BusRdX

15

example of race conditions
Example of Race Conditions
  • Suppose P1 & P2 cache same block A in shared state S. And both simultaneously issue a write to block A.
  • Show how P1 may have a request outstanding waiting for the bus while a transaction from P2 appears on the bus, How do you solve this complication?

Scenario: P1’s write will check its cache determines that it needs to change the state of the block from S->M before it can write data into the block, and issues an upgrade bus request (a BusUpgr: similar to BusRdX but does not cause memory or other caches to respond with the data for the block)

In the meantime, P2 has also issued a BusRdX for A, and it may have won bus arbitration for the bus first. P1’s controller will see the bus transaction and must downgrade the state of the block from S to I in its cache.

16

example of race conditions17
Example of Race Conditions

Otherwise, when P2’s transaction is over, A will be in modified state in P2 and in shared state in P1, which violates the protocol.

But now the upgrade bus request that has P1 outstanding is no longer appropriate and must be replaced with a Read-Exclusive request.

How do we implement this??

17

solution
solution
  • A controller must also be able to check addresses snooped from the bus against its own outstanding request and modify the latter if necessary.
handling non atomicity transient states
Handling Non-atomicity: Transient States
  • Two types of states
    • Stable (e.g. MESI)
    • Transient or Intermediate

Example:

In response to a write operation to a

Block in the S state, the cache

Controller:

1) Begins arbitration for the bus by issuing

Busrequest

2) And transitions to the intermediate

State S M.

The state transitions to M upon a busgrant.

However, if a BusRdX is observed then the

Controller treats the block as having been

Invalidated and transition to the I  M state.

  • Increases complexity
      • e.g. don’t use BusUpgr, rather other mechanisms to avoid data transfer

19