1 / 26

Deterministic Multiprocessing

Deterministic Multiprocessing. Chris Fallin, David Lewis, Zongwei Zhou. What is Deterministic MP?. Multiprocessor executes multiple threads Threads share resources (ie, memory) Due to bus arbiters, memory controllers, etc, some orderings in shared resources are undefined

benita
Download Presentation

Deterministic Multiprocessing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deterministic Multiprocessing Chris Fallin, David Lewis, Zongwei Zhou Date & location of presentation

  2. What is Deterministic MP? • Multiprocessor executes multiple threads • Threads share resources (ie, memory) • Due to bus arbiters, memory controllers, etc, some orderings in shared resources are undefined • Problem for: debugging (reproducibility), thorough testing (many possible cases) • Deterministic: same input  same output

  3. Types of Determinism • Strong: same input  same output, regardless of race conditions • Must capture all communicating memory access pairs • Weak: same input  same output, as long as locking is correct • Takes advantage of locks for low SW overhead

  4. Types of Deterministic Execution • Record/Replay: HW/SW keeps log of program input • Single-program: system calls, memory interleavings • Full-system: interrupts, I/O, etc • Log allows later replay of a bug • However, several executions may still differ outside of replay • Full-time • Ordering of memory accesses follows a statically-defined deterministic order: for same program and same input, output is always same

  5. DMP: Deterministic Shared Memory Multiprocessing Devietti, Lucia, Ceze, Oskin

  6. Central Idea • To guarantee deterministic behavior:  - the direct way is to preserve the same global interleaving of instructions in every execution of a parallel program     - unnecessary and significant performance impact • Insight: only communicating pairs matter

  7. Improve a bit....... • Not all memory access is communicating • can parallelize communication-free portion in each quantum  • need to know when communications happen! • MESI cache coherence protocol provides this for free DMP Sharing Table     - tracks info about mem ownership     - two ownership change possibilities:         - reading data owned by others         - writing data to shared memory

  8. Improve a bit  more...... • Transactional Memory + deterministic commit order • TM: atomic and isolation of quantum • Speculation: find quantum not involved in communication • If communication happens, squash + re-execute • potential optimization: • forward uncommitted (or speculative) data between quanta • could save a large number of squashes

  9. Performance   

  10. Discussion • Speculation • similar idea, but use for opposite purpose to TLS • require complex hardware • I/O or parts of OS can not execute speculatively • Dealing with nondeterminism • threads can use OS to communicate • nondeterministic OS API calls, e.g. read • Better way of token-passing?

  11. Kendo: Efficient Deterministic Multithreading in Software Olszewski, Ansel, Amarasinghe

  12. Definitions • Strong Determinism •  Deterministic order of memory accesses to shared data for particular program input • ALWAYS produces same output for every run with a particular input • Not easily providable without hardware support • Weak Determinism • Deterministic order of lock acquisitions for a given program input • Produces same output for every run if race-free • Can be guaranteed if all accesses to shared data protected by locks • If no data-races, strong and weak determinism provide same guarantees!

  13. Introducing Kendo • Software framework to enforce weak determinism of general lock-based C/C++ code for commodity shared-memory multiprocessors • No special hardware necessary! • Deterministic Logical Time • Each thread has its own monotonically increasing deterministic logical clock •  How to implement? Performance counter events? • When is it a thread T's turn to use a lock? • All threads with tid < T have greater logical clocks • All threads with tid ≥ T have greater or equal logical clocks

  14. Simple Locking Mechanism function det_mutex_lock(l) {   pause_logical_clock();   wait_for_turn();   lock(l);   inc_logical_clock();   resume_logical_clock(); } function det_mutex_unlock(l) {   unlock(l); } • Simple algorithm for implementing locks • Pause logical clock during acquisition and wait for turn to access lock (using heuristic in previous slide) • Once in critical section resume the clock and continue • Pros: • Easy to implement • Problems?

  15. Improved Lock function det_mutex_lock(l){   pause_logical_clock();   while(true){        // Loop until we have successfully acquired the lock .     wait_for_turn(); // Wait for our deterministic logical clock to be unique global minimum      if (try_lock(l)){ // Check the state of the lock , acquiring it if it is free       if(l.released_logical_time   // Lock is free in physical time, but still acquired in          >= get_logical_clock()){ // deterministic logical time so we cannot acquire it yet         unlock(l);            // Release the lock       } else {               // Lock is free in both physical and in deterministic logical         break;                // time, so it is safe to exit the spin loop       }     }     inc_logical_clock();  // Increment our deterministic logical clock and start over   }   inc_logical_clock(); // Increment our deterministic logical clock before exiting   resume_logical_clock(); } function det_mutex_unlock(l){   pause_logical_clock();   l.released_logical_time = get_logical_clock();   unlock(l);   inc_logical_clock();   resume_logical_clock(); }

  16. Optimizations • Queuing • Queue for each lock guarantees first-come first-serve • Fast-forwarding • While waiting for a lock can set logical time to lock.released_logical_time (or +1 if queuing) • Lazy reads • If application can read out-of-date shared data, no need to lock on read (i.e. finding a "best" value) • Provide read window (in logical time), if all threads past earliest allowable logical time, can successfully read

  17. Results

  18. Capo: A Software-Hardware Interface for Practical Deterministic Multiprocessor Replay Montesinos, Hicks, King, Torellas

  19. Capo: Motivation • Record/replay system for debugging • Not intended to be deployed in the field • Builds on DeLorean [1] • Chunk-based record/replay system • Terminate chunks at communicating pairs, record chunk commit orderonly • Only half the story • Capo adds software side as a Linux implementation: • Record syscall results • Provide infrastructure to record/replay multiple programs and multiplex hardware record/replay features [1] P. Montesinos, L. Ceze, and J. Torrellas, “DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently,” in ISCA, June 2008.

  20. Capo's Contributions • Replay Spheres: distinct realms of record/replay  • Defining hardware-software interface • Simulated DeLorean hardware (chunk-based recording) • Linux kernel modifications

  21. Capo Architecture • Replay Sphere: set of R-threads; isolated environment • Arbitrary set of processes is inside sphere • Replay Sphere Mgr: multiplexes HW support over spheres • HW: records chunk commit order (DeLorean) • SW: records system calls • OS not inside sphere, except copy_to_user()

  22. Hardware Details

  23. Performance Record Replay

  24. Log Size

  25. Helps with… Capo(record/replay) Kendo DMP debugging testing replicas deployment Needs hw usually no yes Summary (Devietti et al)

  26. Discussion • Which is more useful: record/replay or full-time? • Debugging only, vs. system design philosophy • Tradeoff: cost (log size, overhead) vs. utility • Strong vs. weak determinism • Race conditions are an important class of bugs

More Related