slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CSS434 Distributed Shared Memory Textbook Ch18 PowerPoint Presentation
Download Presentation
CSS434 Distributed Shared Memory Textbook Ch18

Loading in 2 Seconds...

play fullscreen
1 / 26

CSS434 Distributed Shared Memory Textbook Ch18 - PowerPoint PPT Presentation


  • 145 Views
  • Uploaded on

CSS434 Distributed Shared Memory Textbook Ch18. Professor: Munehiro Fukuda. Memory. Memory. Memory. CPU 1. CPU 1. CPU 1. :. :. :. CPU n. CPU n. CPU n. MMU Page Mgr. MMU Page Mgr. MMU Page Mgr. Node 2. Node 0. Node 1. Basic Concept. address. Distributed Shared Memory

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CSS434 Distributed Shared Memory Textbook Ch18' - trang


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

CSS434 Distributed Shared Memory

Textbook Ch18

Professor: Munehiro Fukuda

CSS434 DSM

basic concept

Memory

Memory

Memory

CPU 1

CPU 1

CPU 1

:

:

:

CPU n

CPU n

CPU n

MMU

Page Mgr

MMU

Page Mgr

MMU

Page Mgr

Node 2

Node 0

Node 1

Basic Concept

address

Distributed Shared Memory

(exists only virtually)

write(address, data);

Data = read(address);

Communication Network

A cache line or a page is transferred to and cached in

the requested computer.

CSS434 DSM

writer process on dsm
Writer Process on DSM

#include "world.h"

struct shared { int a,b; };

Program Writer:

main()

{

int x;

struct shared *p;

methersetup(); /* Initialize the Mether run-time */

p = (struct shared *)METHERBASE; /* overlay structure on METHER segment */

p->a = p->b = 0; /* initialize fields to zero */

while(TRUE) { /* continuously update structure fields */

p –>a = p –>a + 1;

p –>b = p –>b - 1;

}

}

CSS434 DSM

reader process on dsm
Reader Process on DSM

Program Reader:

main()

{

struct shared *p;

methersetup();

p = (struct shared *)METHERBASE;

while(TRUE) { /* read the fields once every second */

printf("a = %d, b = %d\n", p –>a, p –>b);

sleep(1);

}

}

CSS434 DSM

why dsm
Why DSM?
  • Simpler abstraction
    • Underlying tedious communication primitives are all shielded by memory accesses
  • Better portability of distributed application programs
    • Natural transition from sequential to distributed application
  • Better performance of some applications
    • Data locality, one-demand data movement, and large memory space reduce network traffic and paging/swapping activities.
  • Flexible communication environment
    • Sender and receiver have no need to know each other. They even need not coexist.
  • Ease of process migration
    • Migration is completed only by transferring the corresponding PCB to the destination.

CSS434 DSM

main issues
Main Issues
  • Granularity
    • Fine (less false sharing but more network traffic) Cache line (e.g. Dash and Alewife), Object (e.g. Orca and Linda), Page (e.g. Ivy)  Coarse(more false sharing but less network traffice)
  • Memory coherence and access synchronization
    • Strict, Sequential, Causal, Weak, and Release Consistency models
  • Data location and access
    • Broadcasting, centralized data locator, fixed distributed data locator, and dynamic distributed data locator
  • Replacement strategy
    • LRU or FIFO (The same issue as OS virtual memory)
  • Thrashing
    • How to prevent a block from being exchanged back and forth between two nodes.
  • Heterogeneity

CSS434 DSM

consistency models two processes accessing shared variables

Process 1

Process 2

br := b;

a := a + 1;

Condition satisfied

a

r := a;

b := b + 1;

a == 1

b == 1

if(ar ≥ br) then

print ("OK");

Condition satisfied

a == 1

b == 0

This may happen if new contents are transmitted

through a different route.

b == 1

a == 0

Consistency ModelsTwo processes accessing shared variables

At the beginning a = b = 0;

DSM needs a consistency model.

CSS434 DSM

consistency models strict consistency
Consistency ModelsStrict Consistency
  • Wi(x, a): Processor i writes a on variable x, (i.e., x = a;).
  • bRi(x): Processor i reads b from variable x. (i.e., y = x; && y == b;).
  • Any read on x must return the value of the most recent write on x.

Strict Consistency

Not Strict Consistency

P3

P2

P2

P1

P1

P3

W2(x, a)

W2(x, a)

nilR1(x)

aR1(x)

aR1(x)

aR3(x)

aR3(x)

aR1(x)

CSS434 DSM

consistency models linearizability and sequential consistency
Consistency ModelsLinearizability and Sequential Consistency
  • Linearlizability: Operations of each individual process appear to all processes in the same order as they happen.
  • Sequential Consistency: Operations of each individual process appear in the same order to all processes.

Linearlizability

Sequential Consistency

P4

P3

P2

P1

P4

P3

P2

P1

W2(x, a)

W2(x, a)

Nil <-R1(x)

W3(x, b)

W3(x, b)

aR1(x)

bR1(x)

aR4(x)

bR4(x)

bR4(x)

bR1(x)

aR4(x)

aR1(x)

CSS434 DSM

consistency models fifo and processor consistency
Consistency ModelsFIFO and Processor Consistency
  • FIFO Consistency: writes by a single process are visible to all other processes in the order in which they were issued.
  • Processor Consistency: FIFO Consistency + all write to the same memory location must be visible in the same order.

FIFO Consistency

Processor Consistency

P4

P3

P2

P1

P2

P1

P3

P4

W2(x, a)

W2(x, a)

W3(x, 0)

aR1(x)

W2(x, b)

aR1(x)

W3(y, 0)

W2(x, b)

0R1(x)

W3(x, 1)

aR1(x)

0R1(y)

aR1(x)

W3(y, 1)

0R1(y)

bR1(x)

1R1(y)

0R1(x)

1R1(y)

W3(z, 1)

W3(z, a)

1R1(x)

bR1(x)

W2(y, a)

1R1(x)

W2(y, a)

bR1(x)

bR1(x)

1R1(z)

1R1(z)

aR1(y)

aR1(y)

aR1(y)

1R1(z)

1R1(z)

aR1(y)

CSS434 DSM

consistency models causal consistency
Consistency ModelsCausal Consistency
  • Causally related write must be visible to all processes in the same order. Concurrent writes may be propagated in a different order.

Causal Consistency

Not Causal Consistency

P4

P3

P2

P1

P4

P3

P2

P1

W2(x, a)

W2(x, a)

aR3(x)

aR4(x)

aR3(x)

aR3(x)

W2(x, c)

W3(x, b)

W3(x, b)

bR4(x)

cR1(x)

aR1(x)

bR4(x)

cR4(x)

bR1(x)

bR1(x)

aR4(x)

CSS434 DSM

consistency models weak consistency
Consistency ModelsWeak Consistency
  • Accesses to synchronization variables must obey sequential consistency.
  • All previous writes must be completed before an access to a synchronization variable.
  • All previous accesses to synchronization variables must be completed before access to non-synchronization variable.

Weak Consistency

Not Weak Consistency

P3

P2

P3

P1

P2

P1

W2(x, a)

W2(x, a)

W2(x, b)

W2(y, c)

W2(y, c)

bR4(x)

W2(x, b)

aR4(x)

S3

NilR4(y)

S3

S1

S1

S2

S2

bR4(x)

aR4(x)

bR4(x)

cR4(y)

cR4(y)

cR4(y)

cR4(y)

bR4(x)

CSS434 DSM

consistency models release consistency
Consistency ModelsRelease Consistency
  • Access to acquire and release variables obey processor consistency.
  • Previous acquires requested by a process must be completed before the process performs a data access.
  • All previous data accesses performed by a process must be completed before the process performs a release.

P3

P2

P1

Acq1(L)

W1(x, a)

W1(x, b)

Rel1(L)

Acq2(L)

bR2(x)

bR2(x)

aR3(x)

Rel2(L)

CSS434 DSM

slide14

Consistency ModelsRelease Consistency (Example)

Process 1:

acquireLock(); // enter critical section

a := a + 1;

b := b + 1;

releaseLock(); // leave critical section

Process 2:

acquireLock(); // enter critical section

print ("The values of a and b are: ", a, b);

releaseLock(); // leave critical section

CSS434 DSM

implementing sequential consistency replicated and migrating data blocks

Processor

Processor

Node 2

Processor

cache

cache

Duplicate

cache

x

memory

memory

x

m

n

y

memory

a

b

Implementing Sequential ConsistencyReplicated and Migrating Data Blocks

Node 1

Node 3

x

m

b

Then what if Node 2 updates x?

CSS434 DSM

implementing sequential consistency write invalidation

new copy

2. Replicate block

3. Invalidate block

3. Invalidate block

1. Request block

Implementing Sequential ConsistencyWrite Invalidation

Client wants to write:

new copy

a copy of

block

block

a copy of

block

CSS434 DSM

implementing sequential consistency write update

new copy

new copy

2. Replicate block

3. Update block

3. Update block

1. Request block

new copy

new copy

new copy

Implementing Sequential ConsistencyWrite Update

Client wants to write:

a copy of

block

block

a copy of

block

CSS434 DSM

implementing sequential consistency read write request
Implementing Sequential ConsistencyRead/Write Request

Unused

Read

(Read a copy from the onwer)

Replacement

Replacement

Replacement

Replacement

Nil

Write invalidate

Read only

Read

(Read from memory and get an ownership)

Write invalidate

Write

(invalidate others if they have a copy

and get an ownership)

Write

(invalidate others if they have a copy

and get an ownership)

Write invalidate

Writable

Read-owned

Write

(invalidate others if they have a copy)

CSS434 DSM

implementing sequential consistency locating data fixed distributed server algorithms

Read request

Location search

Block replication

Implementing Sequential ConsistencyLocating Data –Fixed Distributed-Server Algorithms

Processor 0

Processor 1

Processor 2

Addr0

writable

Addr3

read owned

Addr2

read owned

Addr1

read owned

Addr7

writable

Addr4

read owned

Addr5

writable

Addr6

writable

Read addr2

Addr8

read owned

Addr2

read only

CSS434 DSM

implementing sequential consistency locating data dynamic distributed server algorithms

p1

p1

Read request

Location search

Block replication

Implementing Sequential ConsistencyLocating Data – Dynamic Distributed-Server Algorithms
  • Breaking the chain of nodes:
    • When the node receives an invalidation
    • When the node relinquishes ownership
    • When the node forwards a fault request
  • The node points to a new owner

Processor 0

Processor 1

Processor 2

Addr0

writable

Addr3

read owned

Addr2

read only

Addr2

read owned

Addr1

read owned

Addr7

writable

Addr4

read owned

Addr8

read owned

Addr5

writable

Read addr2

Addr2

read owned

CSS434 DSM

replacement strategy
Replacement Strategy
  • Which block to replace
    • Non-usage based (e.g. FIFO)
    • Usage based (e.g. LRU)
    • Mixed of those (e.g. Ivy )
      • Unused/Nil: replaced with the highest priority
      • Read-only: the second priority
      • Read-owned: the third priority
      • Writable: the lowest priority and LRU used.
  • Where to place a replaced block
    • Invalidating a block if other nodes have a copy.
    • Using secondary store
    • Using the memory space of other nodes

CSS434 DSM

thrashing
Thrashing
  • Thrashing:
    • Two or more processes try to write the same shared block.
    • An owner keeps writing its block shared by two or more reader processes.
    • The larger a block, the more chances of false sharing that causes thrashing.
  • Solutions:
    • Allow a process to prevent a block from accessed from the others, using a lock.
    • Allow a process to hold a block for a certain amount of time.
    • Apply a different coherence algorithm to each block.
  • What do those solutions require users to do?
  • Are there any perfect solutions?

CSS434 DSM

paper review by students
Paper Review by Students
  • IVY
  • Dash
  • Munin
  • Linda/Jini/JavaSpace
  • Discussions:
    • Classify which system is based on sequential consistency, release consistency, and lazy release consistency.
    • Classify the shared data granularity of these systems: cache-line based, page-based, and object-based.
    • Classify the implementation of these systems: hardware implementation, OS implementation, and User-level implementation.

CSS434 DSM

non turn in exercises
Non-Turn-In Exercises
  • Is the memory underlying the following execution of two processes sequentially consistent (assuming that, initially, all variables are set to zero)? P1: R(x)1; R(x)2; W(y)1
    • P2: W(x)1; R(y)1; W(x)2
  • Show that the following history is not causally consistent.
    • P1: W(a)0; W(a)1
    • P2: R(a)1; W(b)2
    • P3: R(b)2; R(a)0
  • Explain the relationship between false sharing and data granularity in DSM.

CSS434 DSM

non turn in exercises1
Non-Turn-In Exercises

Processor 3

ownership table

Processor 2

ownership table

Processor 1

ownership table

addr

owner

shared

addr

owner

shared

addr

owner

shared

6

P3

3

P2

0

P0

4

7

P2

4

P3

1

P0

P3

8

P2

5

P0

2

P3

data items

data items

data items

addr

2

addr

3

addr

0

addr

4

addr

7

addr

1

addr

6

addr

8

event

copyaddr1

  • There is a DSM system that is based on the write-invalidation protocol, uses a fixed distributed-server algorithm for locating a given data item, and consists of three processors such as 1, 2, and 3. Each processor has the following data items and an ownership/sharing-processor table.

CSS434 DSM

non turn in exercises2
Non-Turn-In Exercises
  • Given the following sequence of memory accesses, draw additional arrows and circles in the above figure as instructed. To distinguish which arrow corresponds to which operation, add the operation number 1 – 8 to each arrow. Also, update the corresponding ownership table entries.
  • (1) Memory access #1: Processor 2 reads data from address 2.
  • Add arrows in the above figure to indicate operations required for the memory access #1.
    • 1. Send a query to search for the address 2
    • 2. Send a request to read from the address 2
    • 3. Read data from the address 2 to Processor 2
  • Update the corresponding ownership table entry. (Just add P2 in the “share” field.)
  • Draw a circle to indicate that a copy of address 2 was created on Processor 2.
  • (2) Memory access #2: Processor 1 reads data from address 2.
  • Add arrows in the above figure to indicate operations required for the memory access #2.
  • 4. Send a query to search for the address 2
  • 5. Send a request to read from the address 2
  • 6. Read data from the address 1 to Processor 2
  • Update the corresponding ownership table entry. (Just add P1 in the “share” field.)
  • Draw a circle to indicate that a copy of address 2 was created on Processor 1.
  • (3) Memory access #3: Processor 2 writes data to address 2.
  • Add arrows in the above figure to indicate operations required for the memory access #3.
  • 7. Send a request to update the ownership information on the address 2
  • 8. Send a write invalidation to all non-owner processors sharing the address 2
  • Update the corresponding ownership table entry. (Make Processor 2 a new owner of address 2 and cross out all other processor Ids in the entry.)
  • Cross out all circles to indicate that old copies of address 2 were all invalidated.

CSS434 DSM