1 / 31

TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS. J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel Rice University. INTRODUCTION.

jaron
Download Presentation

TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. CarterUniversity of Utah J. K. Bennett and W. ZwaenepoelRice University

  2. INTRODUCTION • Distributed shared memory is a software abstraction allowing a set of workstations connected by a LAN to share a single paged virtual address space • Key issue in building a software DSM is minimizing the amount of data communication among the workstation memories

  3. Why bother with DSM? • Key idea is to build fast parallel computers that • are cheaper than conventional architectures • are convenient to use • Conventional parallel computer architecture was the shared memory multiprocessor

  4. CACHE CACHE CACHE CACHE Conventional parallel architecture CPU CPU CPU CPU Shared memory

  5. Today’s architecture • Clusters of workstations are much more cost effective • No need to develop complex bus and cache structures • Can use off-the-shelf networking hardware • Gigabit Ethernet • Myrinet (1.5 Gb/s) • Can quickly integrate newest microprocessors

  6. Limitations of cluster approach • Communication within a cluster of workstation is through message passing • Much harder to program than concurrent access to a shared memory • Many big programs were written for shared memory architectures • Converting them to a message passing architecture is a nightmare

  7. Distributed shared memory main memories DSM = one shared global address space

  8. Distributed shared memory • DSM makes a cluster of workstations look like a shared memory parallel computer • Easier to write new programs • Easier to port existing programs • Key problem is that DSM only provides the illusion of having a shared memory architecture • Data must still move back and forth among the workstations

  9. Characterizing a DSM (I) • Four important issues: 1. Size of transfer units (level of granularity) • Big units are more efficient • Virtual memory pages • Can have false sharing whenever page contains different variables that are accessed at the same time by different processors

  10. False Sharing accesses y accesses x x y page containing x and y will move back and forthbetween main memories of workstations

  11. Characterizing a DSM (II) 2. Consistency model • Strict consistency is not possible • Various authors have proposed weak consistency models • Cheaper to implement • Harder to use in a correct fashion

  12. Characterizing a DSM (III) 3. Portability of programs • Some DSMs allowprograms written for a multiprocessor architecture to run on a cluster of workstations without any modifications (dusty decks) • More efficient DSMs require more changes 4. Portability of DSM • Some DSMs require specific OS features

  13. MUNIN • Developed at Rice University • Based on software objects (variables) • Uses the processor virtual memory to detect access to the shared objects • Includes several techniques for reducing consistency-related communication • Only runs on top of V kernel

  14. Key features • Software release consistency: only requires the memory to be consistent at specific synchronization points, • Multiple consistency protocols: allow the user to select the best consistency protocols for each data item, • Write-shared protocols: reduce false sharing, • An update-with-timeoutmechanism

  15. SW RELEASE CONSISTENCY (I) • Well-written parallel programs use locks to achieve mutual exclusion when they access shared variables • P(&mutex) and V(&mutex) • lock(&csect) and unlock(&csect) • request ( ) and release( ) • Unprotected accesses can produce unpredictable results

  16. SW RELEASE CONSISTENCY (II) • SW release consistency will only guarantee correctness of operations within a request/release pair • No need to propagate new values of shared variables until the release • Must guarantee that workstation has received the most recent values of all shared variables when it completes a request

  17. shared int x; request( ); x = 1;release ( ); // propagate x=1 shared int x; request( );// wait for new value of x x++;release ( ); // propagate x=2 SW RELEASE CONSISTENCY (III)

  18. SW RELEASE CONSISTENCY (IV) • Munin uses eager release: new values of shared variables are propagated at release time • Lazy release delays propagation until a request is issued (Threadmarks) • A workstation issuing a request gets the current values of all shared variables • Shared variables are not associated to a particular critical section (as in Midway)

  19. Munin Implementation (I) • Three kinds of variables: • Ordinary variables: can only be accessed by the process that created them • Shared data variables: should always be accessed from within critical regions • Synchronization variables: • locks, barriers or condition variables • must be accessed through special library procedures .

  20. Munin Implementation (II) • When a processor modifies shared data inside a critical region, all update messages are buffered and delayed until the processor leaves the critical region • Processes accessing shared data variables outside critical regions do it at their own risks • Same as with shared memory model • Risk is higher

  21. FOUR CONSISTENCY PROTOCOLS 1. Conventional shared variables: • Replicated on demand • Single writer/multiple readers policy uses an invalidation-basedprotocol 2. Read-only variables: • Replicated on demand • Any attempt to modify them will result in a runtime error

  22. FOUR CONSISTENCY PROTOCOLS 3. Migratory variables: • Migrated among the processes accessing them • Every process accessing them will always get full read and write access 4. Write-shared variables: • Can be updated concurrently because different portions of the page are accessed

  23. Implementation • Programmer uses annotations to specify any of the last three consistency protocols • Read-only variables • Migratory variables • Write-shared variables • Incorrect annotations may result in inefficient performance or in runtime errors but not in incorrect results

  24. WRITE-SHARED PROTOCOL (I) • Designed to fight false sharing • Uses a copy-on-write mechanism • Whenever a process is granted access to write-shared data, the page containing these data is marked copy-on-write • First attempt to modify the contents of the page will result in the creation of a copy of the page modified (the twin).

  25. Example Before First write access x = 1 y = 2 x = 1 y = 2 twin After Compare with twin x = 3 y = 2 New value of x is 3

  26. WRITE-SHARED PROTOCOL (II) • At release time, the DSM will perform a word by word comparison of the page and its twin, store the diff in the space used by the twin page and notify all processors having a copy of the shared data of the update • A runtime switch can be set to check for conflicting updates to write-shared data.

  27. UPDATE TIME-OUT MECHANISM • Munin does not send updates to processors holding stale replicas • Anytime a processor receives an update for a page for which it does not have a twin, the page is marked supervisor-only and the time of receipt of the update is recorded. • First local access to the page will cause a trap that will remove the restriction

  28. UPDATE TIME-OUT MECHANISM • When a process receives an update for a page that is still marked supervisor only, it checks the timestamp of the last update • If more than 50 ms have elapsed, process notifies the originator of the update not to send more updates and invalidates the page.

  29. CONCLUSIONS (I) • The strongest point of Munin is its excellent performance • typically within 5 to 33% of the performances of hand-coded message passing versions of the same programs • Its major limitation is its dependence of some features of the V kernel

  30. CONCLUSIONS (II) • Munin requires programs to access shared data from within critical regions or after barriers • Appears to be a reasonable requirement • Munin allows users to tune the performance of their programs by selecting the best consistency protocol for each shared variable • Can quickly become a tedious process

  31. FURTHER DEVELOPMENTS • Same team has come with a successor to Munin named TreadMarks • Key differences are: • TreadMarks uses a more complexlazy release protocol • TreadMarks is UNIX-based • More portable

More Related