1 / 17

Kendo: Efficient Deterministic Multithreading in Software

Kendo: Efficient Deterministic Multithreading in Software. M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos. Motivation. Parallel applications Non-determinism inherent in threaded applications Hard to develop, debug, test, maintain etc.

edna
Download Presentation

Kendo: Efficient Deterministic Multithreading in Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos

  2. Motivation • Parallel applications • Non-determinism inherent in threaded applications • Hard to develop, debug, test, maintain etc. • Modify running environment to make the parallel application run deterministically • Make thread communication through shared memory deterministic • Deterministic interleaving of lock acquisition

  3. Deterministic Multithreading • Strong Determinism • Same output for every run – too costly • Weak Determinism • Same output for all the inputs that lead to a race-free execution under the deterministic scheduler.

  4. Benefits of Deterministic Multithreading • Repeatability • Closest approach: record/replay systems can provide determinism for a single recorded run • Debugging • Cyclic debugging methodology • Testing • Test output or intermediate states of a program to justify correctness • Multithreaded Replicas • Replica-based fault tolerant • Give same input to replicas and expect same behavior

  5. Deterministic Logical Time • ‘P’ monotonically increasing clocks, one for every thread • Counting arbitrary events (for every thread), that are repeatable across executions • e.g. writes performed, instructions committed • Measure of progress for every thread • Decide on the thread interleaving (lock acquisition) based on logical time

  6. Simplified Locking Algorithm • At any given point it’s only one’s thread turn to acquire a lock: • All threads with a smaller ID have greater deterministic logical clocks • All threads with a larger ID have greater or equal deterministic logical clocks • Turn waiting enforces a First-Come-First-Serve ordering of threads in logical time

  7. Pseudocode for simplified locking algorithm

  8. Improved Locking Algorithm

  9. Improved Locking Algorithm

  10. Optimizations • Queueing for fairness • Queue structure in every lock • The thread at the head of the queue gets the lock; other threads spin increasing their logical clock • Deterministic logical clock fast-forwarding • A thread advances its clock to lock.released_logical_time to save time from spinning • Lock priority boosting (?) • If you can predict the next thread to get a lock, then decrease its clock to give it higher priority.

  11. Implementation • Deterministic Logical Clocks • retire_stores hardware counter; on an overflow increment the software counter maintained in shared memory • Chunk size: number of stores needed to cause an overflow • Small chunk size higher overhead due to interrupt handlers • Increment amount: fidelity of the logical clock • Can be different when counter goes off and when trying to get a lock

  12. Implementation • Thread Creation • Need to be careful when creating new threads • parent thread need to wait for its turn before initiating new thread • Lazy reads (unprotected reads) • Provide API for deterministically reading unprotected data, writes always done with a lock • Keep a table of all <values,logical times>

  13. Evaluation • 2.66GHz Intel Core 2 Quad running Debbian • SPLASH-2 benchmark suite • also parallel traveling-sales-person (tsp) and parallel quicksort

  14. Evaluation

  15. Evaluation

  16. Evaluation

  17. Conclusions • Software-only solution to provide weak deterministic multithreading • Control the interleaving of lock acquisitions to make it deterministic • Low overhead (16%) for up to four threads (?) in SPLASH benchmarks

More Related