Noc for cache coherence l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

NoC for Cache Coherence PowerPoint PPT Presentation


  • 207 Views
  • Uploaded on
  • Presentation posted in: General

NoC for Cache Coherence . NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar. Cache coherence problem in NUCA . Update other L2$. T1. T2. P1. A. A. B. P3. B. L2$. L2$. P2. P4. L2$. L2$.

Download Presentation

NoC for Cache Coherence

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Noc for cache coherence l.jpg

NoC for Cache Coherence

NoC Seminar

Technion

Vainbaum Yuri

Mentor I.Keidar


Cache coherence problem in nuca l.jpg

Cache coherence problem in NUCA

Update other L2$

T1

T2

P1

A

A

B

P3

B

L2$

L2$

P2

P4

L2$

L2$

  • The cache coherency problem appears when tasks running on different processors in the SoC share data stored in the system memory.

  • When a task T1 running on a processor P1 modifies a data shared with task T2 , which runs on the processor P2, that data’s copy on P2 processor’s cache must be either updated or invalidated, before a new access to it.


Mesi maintain the coherence in cached systems l.jpg

MESI -maintain the coherence in cached systems

Modified: Actually, it is an exclusive-modified state. It means that the cache has the only copy that is correct in the whole system. The data which are in the main memory are wrong.

Exclusive: Exclusive without having been modified. That is, this cache is the only one that has the correct value of the block. Data blocks are according to the existing ones in the main memory.

Invalid: It is a non-valid state. The data you are looking for are not in the cache, or the local copy of these data is not correct because another processor has updated the corresponding memory

position.

Shared: Shared without having been modified. Another processor can have the data into the cache memory and both copies are in their current version.


In network cache coherence l.jpg

In-Network Cache Coherence

  • Propose :

    • Implementation of the coherence protocol and directories within the network at each router node.

    • This opens up the possibility of optimizing a protocol with in-transit actions

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)


In network read optimization l.jpg

In-Network Read optimization

H

A

B

Read request

Directory based MSI

To sheerer

data

  • Three end-to-end messages

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)


In network read optimization6 l.jpg

In-Network Read optimization

H

A

B

Read request

In-Network MSI

data

  • Node B “bumps” into node A

  • While message in-transit to the home node H obtain the data directly from A

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)


In network write optimization l.jpg

In-Network Write optimization

H

A

B

C

write request

Inv

Ack

Inv

Directory based MSI

Ack

data

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)


In network write optimization8 l.jpg

In-Network Write optimization

H

A

B

C

write request

Inv+Ack

In-network MSI

Ack +inv

data

  • This in-transit optimization can reduce write communication from two round-trips to a single round-trip from C to H and back

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)


In network cache coherence protocol l.jpg

In-Network cache coherence protocol

H

R

  • Idea: move coherence directories from the nodes into the network fabric

  • Virtual trees, one for each cache line, are maintained within the network in place of coherence directories to keep track of sharers

  • The virtual tree consists of one root node R which is the node that first loads a cache line from off-chip memory, all nodes that sharing this line and intermediate nodes between root and sharers

  • Nodes of the tree are connected by virtual links

  • Virtual trees are stored in virtual tree caches at each router within the network

  • Reads and writes are routed towards the home node, if they encounter a virtual tree in-transit, the virtual tree takes over as the routing function and steers read requests and write invalidates appropriately towards the sharers instead.

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)


In network read access example l.jpg

In-Network-Read access example

2. Load line

from off-chip

3. Constructs virtual tree

4. Hits virtual tree on the way to home node

H

H

read2

R2

5. Steered to

nearest copy

1.Towards home node

6. Returns data

and constructs

new virtual

tree links

R1

R1

read1

Second read request to the same line –Read2

New read request –Read1


In network router micro architecture l.jpg

In-Network Router micro-architecture

Flit

  • Virtual tree cache serves to steer head flits towards the appropriate output ports.

  • Virtual tree cache points them towards caches housing the most up-to-date data requested

  • Memory address contained in each packet’s header is first parsed into < tag, index,o f f set > if the tag matches, there is a hit in the tree cache, and its prescribed direction is used as the desired output port


In network results summary l.jpg

In-Network Results & Summary

  • Proposed an approach of cache coherence for chip multiprocessors where the coherence protocol and directories are all embedded within network routers.

  • This approach has a low hardware overhead which quickly leads to hardware savings, compared to the standard directory protocol, as the number of cores per chip increases

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)


Dcos directory cache on a switch l.jpg

DCOS-Directory Cache On a Switch

  • To reduce cache-to-cache data transfer time proposed architecture implemented inside each switch

  • 4x2 2D mesh topology MIPS R10000 core model ,Directory based cache coherence MSI protocol

DCOS: Cache Embedded Switch Architecture for

Distributed Shared Memory Multiprocessor SoCs , Daewook Kim 2006 IEEE


Dcos directory cache on a switch14 l.jpg

DCOS-Directory Cache On a Switch

  • State entry assigned to a memory block holds current state of block : empty, shared, modified /invalid

  • No data items are copied to caches or memories :Marked as “E”

DCOS: Cache Embedded Switch Architecture for

Distributed Shared Memory Multiprocessor SoCs , Daewook Kim 2006 IEEE


Dcos directory cache on a switch15 l.jpg

DCOS-Directory Cache On a Switch

  • Data is shared with other caches and memories

DCOS: Cache Embedded Switch Architecture for

Distributed Shared Memory Multiprocessor SoCs , Daewook Kim 2006 IEEE


Dcos switch architecture l.jpg

DCOS-Switch architecture

  • All directory caches are embedded within crossbar switch


Dcos results l.jpg

DCOS-Results

DCOS: Cache Embedded Switch Architecture for

Distributed Shared Memory Multiprocessor SoCs , Daewook Kim 2006 IEEE


Cache coherency communication cost l.jpg

Cache Coherency Communication Cost

  • How costly is cache coherency in interconnection terms?

  • This paper focuses on bringing light onto this question.

  • Directory based mechanism to maintain coherence among all caches in the system

Cache Coherency Communication Cost in a

NoC-based MPSoC Platform , Gustavo Girão, SBCCI’07, September 3–6, 2007, Rio de Janeiro, Brazil.


Cache coherency communication cost19 l.jpg

Cache Coherency Communication Cost

  • The amount of data on the NoC for regular operations is much larger than the amount of data for cache coherence maintenance for almost all the cache sizes

  • The increase in cache size decreases the amount of data for regular operations, and so the amount of data for cache coherence becomes more significant

Cache Coherency Communication Cost in a

NoC-based MPSoC Platform , Gustavo Girão, SBCCI’07, September 3–6, 2007, Rio de Janeiro, Brazil.


Cache coherency communication cost20 l.jpg

Cache Coherency Communication Cost

  • Graph shows that the amount of page replacement requests is the

  • most responsible for the cache coherence injected load for small

  • cache sizes. This happens because the amount of replacements

  • increases as cache size decreases

8 CPUs,

1 directory

Cache Coherency Communication Cost in a

NoC-based MPSoC Platform , Gustavo Girão, SBCCI’07, September 3–6, 2007, Rio de Janeiro, Brazil.


Benoc bus enhanced network on chip l.jpg

BeNOC –Bus enhanced Network on Chip

  • Low latency, low bandwidth specialized bus, optimized for system-wide distribution of control signals (ack,invl)

  • High performance distributed network that handles high throughput

  • data communication between pairs of modules

BENoC: A Bus-Enhanced Network on-Chip for a Power Efficient CMP

Isask'har Walter, Israel Cidon, and Avinoam Kolodny


Benoc bus enhanced network on chip22 l.jpg

BeNOC –Bus enhanced Network on Chip

  • β -reflects the network-to-bus broadcastlatency ratio

  • n- The number of modules in the system

  • When broadcast operations are compared,the bus is considerably more energy efficient than thenetwork

BENoC: A Bus-Enhanced Network on-Chip for a Power Efficient CMP

Isask'har Walter, Israel Cidon, and Avinoam Kolodny


Network topology awareness l.jpg

Network topology awareness

Invl.

P1

L2$

P2

L2$

Invl.

P3

L2$

  • Wait for furthest invalidation acknowledgment therefore send Invl to P3 first

  • Cache coherence protocol should be aware of the network topology

  • Send invalidation messages according to distances from the directory


Network topology awareness24 l.jpg

Network topology awareness

  • Calculate at each transaction furthest sharing node and send invalidation

  • The total delay will be roundtrip time of invalidation/acknowledge to furthest node

  • Send long delay roundtrip messages first to mask short delay messages.


References l.jpg

References

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)

DCOS: Cache Embedded Switch Architecture for

Distributed Shared Memory Multiprocessor SoCs , Daewook Kim 2006 IEEE

BENoC: A Bus-Enhanced Network on-Chip for a Power Efficient CMP

Isask'har Walter, Israel Cidon, and Avinoam Kolodny

Cache Coherency Communication Cost in a

NoC-based MPSoC Platform , Gustavo Girão, SBCCI’07, September 3–6, 2007, Rio de Janeiro, Brazil.

TEACHING THE CACHE MEMORY COHERENCE WITH THE MESI

PROTOCOL SIMULATOR F. J. JIMÉNEZ1, J. GÓMEZ1, A. MESONES1, E. HERRUZO1, J. I. BENAVIDES1 Y F. J. SÁNCHEZ2

1Dpto. Electrotecnia y Electrónica. Escuela Politécnica Superior. Universidad de Córdoba. Av.

Menéndez Pidal s/n. 14081. Córdoba. Spain.

On cache coherency and memory consistency issues in NoC based shared

memory multiprocessor SoC architectures Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

Exploration of distributed shared memory architectures

for NoC-based multiprocessors

Matteo Monchiero, Gianluca Palermo, Cristina Silvano *, Oreste Villa


  • Login