220 likes | 323 Views
This paper presents PROBILISTIC REPLAY via Execution Sketching (PRES) as an approach to efficiently reproduce bugs on multiprocessors by recording partial ordering of events during production runs and intelligently exploring this information space. The architecture includes Sketch Recorders, PI-Replayer, Replay Recorder, Monitor, and Feedback Generator. The system intelligently replays events based on feedback from failed attempts, allowing for bug reproduction in a small number of replays. The technique balances efficiency and accuracy in bug reproduction, providing valuable insights for improving bug detection and fixing.
E N D
PRES: Probabilistic Replay with Execution Sketching on MultiprocessorsSoyeon Park and Yuanyuan Zhou (UCSD)Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H.Leeand Shan Lu (UIUC)SOSP 2009 LBA Reading Group 9/15/09 Presented by: Michelle Goodstein
Outline • Motivation • PRES Architecture • Capturing Sketches • Replaying Intelligently • Evaluation • Conclusion
Motivation Concurrency bugs are hard… Deterministic Replay can help, but… Deterministic Replay can be expensive What if only record partial information? • Good enough to reproduce bug vs actual execution • Reproduce bug in small (5-50) number of replays rather than first attempt?
PRES • Probabilistic Replay via Execution Sketching • Records partial ordering during production run • Intelligently explores space of partial orderings • Use feedback from failed attempts to reproduce bug in subsequent explorations
PRES Architecture • Sketch Recorders • Partial Information based Replayer (PI-Replayer) • Replay Recorder • Monitor • Feedback Generator
PRES Architecture • Sketch Recorders • During production run • Captures partial ordering of events • Balance of efficiency and usefulness in replay
PRES Architecture • Partial Information based Replayer (PI-Replayer) • During bug reproduction phase • Consults with sketch, feedback from attempts to reproduce bug • Sketch specifies ordering do what sketch proscribes • Feedback says ordering did not produce bug do something else • No info available – execute however desired
PRES Architecture • Replay Recorder • Deterministic replay recorder • Necessary to produce feedback • When bug reproduced, have a deterministic record of how to repeat with 100% probability
PRES Architecture • Monitor: • Tracks replays and detects: • Deviations from sketch (new replay necessary) • Bug reproduced (success!)
PRES Architecture • Feedback Generator • Uses info from recorder to provide feedback for future replay attempts • Try to figure out why bug not discovered
Sketch Recorders • Baseline (Base) • Everything necessary for det. replay on uniprocessor • Synchronization recorder (Sync) • Above + global order at high-level synch ops • System call recorder (Sys) • Above + global order of syscalls • Function call recorder (Func) • Global order of all function calls (Michelle: also + above???) • Nth-Basic block recorder (BB-n) • Records the nth basic block executed, (count is global) • Basic Block recorder (BB) • Global order of all basic blocks • Shared reads/writes (RW) • Standard deterministic replay
Replaying Intelligently • Monitor observes currently replay • Compares current replay to sketch to notice when to abort • Inconsistent or off-sketch • Bug reproduced • Operates only on visible events • Exceptions, timer signals, outputs
Replaying Intelligently • Unsuccessful replays • Sketches that are not RW miss some shared memory data races • If race occurs in certain orders, bug may not manifest • Idea: use info (feedback) from prior runs to guide choice of ordering in next replay attempt
Replaying Intelligently: Generating Feedback • Need to do full RW recording of replay attempt • Using failed replay recordings, identify data races • Filter out data races where sketch implies ordering • Select a data race to invert ordering of • Heuristic, chooses a replay recording and then the race closest to fault • On next replay, execute deterministically until data race encountered, flip order • Then, default PI-Replayer behavior takes over
Conclusion • Interesting use of partial orders as compromise between efficiency and replay • Partial information often sufficient to recover buggy ordering • Similarities to the CHESS paper presented earlier