1 / 31

TreadMarks

TreadMarks. Distributed Shared Memory on Standard Workstations and Operating Systems. Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel. Agenda. DSM Overview TreadMarks Overview Vector Clocks Multi-writer Protocol ( diffs ) TreadMarks Algorithm Implementation Limitations.

boddy
Download Presentation

TreadMarks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel

  2. Agenda • DSM Overview • TreadMarks Overview • Vector Clocks • Multi-writer Protocol (diffs) • TreadMarks Algorithm • Implementation • Limitations

  3. DSM Overview Proc Proc Proc Proc Mem Mem Mem Mem • Global address space virtualization of disparate physical memory • Program using normal thread/locking techniques (no MPI)

  4. DSM Overview Proc Proc Proc Proc Mem Mem Mem Mem • Communication overhead incurred to synchronize memory • Maximize parallel computation and limit communication to improve performance

  5. TreadMarks Overview • Minimize communications to improve DSM performance • Lazy Release Consistency (Vector Clocks) • Multiple Writers (Lazy Diff Creation) • Delay communication as long as possible (possibly even avoid)

  6. TreadMarks OverviewRelease Consistency • Release Consistency: • Shared memory updates must be visible when the release is visible • No need to send updates immediately upon write w(x) P1 P2 w(x)

  7. TreadMarks OverviewLazyRelease Consistency • Lazy Release Consistency: • Shared memory updates are not made visible until the time of acquire • No update propagated if update never acquired w(x) P1 P2 w(x)

  8. Vector Clocks P1 P2 P3 • Global clock mechanism for identifying causal ordering of events in distributed systems • Mattern (1989) and Fidge (1991)

  9. Vector Clocks 0 0 0 P1 0 0 0 P2 0 0 0 P3 • Each process maintains a vector of counters • One for each process in the system

  10. Vector Clocks 0 0 0 P1 0 0 0 P2 0 0 0 P3 • Each process maintains a vector of counters • One for each process in the system

  11. Vector Clocks 0 0 0 P1 1 0 0 0 0 0 P2 0 0 0 P3 • Increments own counter upon Local Event

  12. Vector Clocks 0 0 0 P1 1 0 0 0 0 0 P2 0 0 0 P3 • Increments own counter upon Local Event 0 0 1

  13. Vector Clocks 0 0 0 P1 2 0 2 1 0 0 0 0 0 P2 0 0 0 P3 • Increments own counter and updates all other counters upon Receiving Message 0 0 1 0 0 2

  14. Vector Clocks 0 0 0 P1 2 0 2 1 0 0 3 0 2 0 0 0 P2 3 1 2 0 0 0 P3 • Increments own counter and updates all other counters upon Receiving Message 0 0 1 0 0 2

  15. Diff Creation • Retains copy of page upon first writing P1 P2

  16. Diff Creation • Retains copy of page upon first writing P1 P2

  17. Diff Creation • Create diff by comparing modified page against original (RLC) P1 P2

  18. Diff Creation • Send diff to other processes P1 P2

  19. LazyDiff Creation • Diffs created only when a page is invalidated • Or the modifications are requested explicitly • access miss on invalidated page P1 P2

  20. TreadMarks Algorithm 0 0 0 P1 1 0 0 0 0 0 P3 0 0 1 • P1 Cannot proceed past acquire until: • All modifications have been received from processes whose vector timestamps are smaller P1’s

  21. TreadMarks Algorithm 0 0 0 P1 1 0 0 0 0 0 P3 0 0 1 1 0 0 • On acquire: • P1 Sends Vector Timestamp to releaser

  22. TreadMarks Algorithm 0 0 0 P1 1 0 0 0 0 0 P3 0 0 1 1 0 0 1 0 1 • On acquire: • P1 Sends Vector Timestamp to releaser • P2 Attaches invalidations for all updated counters invalidate

  23. TreadMarks Algorithm 0 0 0 P1 1 0 0 1 0 1 invalidate 0 0 0 P3 0 0 1 1 0 1 • On acquire: • P1 Sends Vector Timestamp to releaser • P2 Attaches invalidations for all updated counters • P2 Sends updated Vector Timestamp with invalidations invalidate

  24. TreadMarks Algorithm 0 0 0 diff P1 w(x) 1 0 0 1 0 1 invalidate 0 0 0 P3 0 0 1 diff • Diffs generated when: • Receiving invalidation (i.e. P1 had made prior updates to this page also) • Page is accessed (miss)

  25. Write notice record page 1 2 proc_id Interval*record Diff pool 1 *VC counter Proc array TreadMarks ImplementationData Structures Page array

  26. TreadMarks ImplementationLocks • Each lock is statically assigned a manager (RR) • Keeps track of processors • Lock acquires are sent to manager (forwarded to last processor to obtain lock) • Upon release, sends (for each interval): • Processor ID and Vector Timestamp • Any invalidations that are necessary

  27. TreadMarks ImplementationBarriers • Centralized barrier Manager • Upon arrival at barrier: • Notifies Manager of intervals that the manager does not already have • Incorporated when Manager arrives at barrier • When all clients have arrived: • Manager notifies all clients of intervals they do not already have • Expensive

  28. Limitations • Achieved nearly linear speedup for TSP, Jacobi, Quicksort, ILINK algorithms • Water: • Each molecule in simulation is protected by lock and frequently accessed • Barriers used in synchronization • Speedup is limited by low computation to communication ratio of algorithm (many fine-grained messages)

  29. Limitations • TSP: • Eager Release Consistency performs better than Lazy Release Consistency (Fig. 9) • Updates occur on invalidation and access misses(writes/synchronization points) • TSP algorithm reads stale ‘current minimum’ value without synchronization

  30. Limitations • Depends on events (write/synchronization) to trigger consistency operations • More opportunities to read stale data (TSP) • Reduced redundancy increases risk of data loss

  31. Summary • Improves performance by improving computation to communication ratio • Delay consistency updates until page access is acquired • Weaker consistency implies greater likelihood of reading stale data and data loss • Procrastination = Performance

More Related