1 / 15

Jordan Adamek Mikhail Nesterenko Sébastien Tixeuil

Symposium on Stabilization, Safety, and Security of Distributed Systems. Evaluating Practical Tolerance Properties of Stabilizing Programs Through Simulation The Case of Propagation of Information with Feedback. Jordan Adamek Mikhail Nesterenko Sébastien Tixeuil. Toronto, Canada.

dylan-gould
Download Presentation

Jordan Adamek Mikhail Nesterenko Sébastien Tixeuil

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Symposium on Stabilization, Safety, and Security of Distributed Systems Evaluating Practical Tolerance Properties of Stabilizing Programs Through Simulation The Case of Propagation of Information with Feedback Jordan Adamek Mikhail Nesterenko Sébastien Tixeuil Toronto, Canada October, 2012

  2. Why Simulate Stabilization • Stabilizing program has to recover from an arbitrary system state • To prove the algorithm correct, the designer has to focus on stabilization from degenerate states that are rarely achieved in practice. • Such exercise tells little about the algorithm’s practical performance • Performance evaluations in the area of stabilization are relatively rare. However, they present unique challenges.What to consider? • states: randomization is a common answer. Yet, uniformly randomized states may be “mild” - evenly distribute process states and may not represent systemic faults • execution models: the model needs to be realistic yet, the results should pertain to the algorithm, not be artifact of the model • parameters: stabilization time is common, yet it often hides the complexity of failure recovery. Other parameters need to be considered. • We simulate stabilizing PIF and analyze its performance using realistic initial state, three classic execution models and compute a number of stabilization parameters

  3. Outline • PIF algorithm • parameter selection and experiment setup • results • analysis • conclusion

  4. PIF Algorithm propagation of information with feedback (PIF) • used to deliver information on rooted trees from root to leaves and get an ack • often considered in stabilization literature; proven ideally- and self- [8,9] as well as snap-stabilizing [1] description • each process can be in one of three states: idle (i), requesting (rq), replying (rp) • root initiates a wave by switching from idle to requesting • each intermediate process p propagates request to its children Ch.p • each leaf reflects the wave back by switching from idle to replying • intermediate processes propagate reply back to root • root waits for reply from all children and repeats the cycle

  5. Initial State Selection tree selection • problem: how to select trees that donot favor particular topology or shape • solution:Prüfer sequence: a sequence of n-2 labels uniquely defines one ofall possible trees of n-labels • random sequence chooses labeledtree with equal probability initial state – need to select initial state, then perturb it by fault of varied extent • problem: not all states occur with equal probabilityex: root is seldom idle • solution: start from idle state, randomly pick a number from range significantly larger than system size, run the algorithm fault-free that number of states, then induce fault

  6. Execution Models & Faults execution models • problem: execution model should not appear to favor particular system and or architecture • solution: selected 3 classic well-studied execution semantics • interleaving – randomly execute one enabled action • power-set – randomly pick the number X of actions to execute, randomly pick first, exclude enabled neighbors; continue until X or all enabled actions are selected; execute selected actions • synchronous – same as power-set only continue randomly selecting actions until none remains faults • randomly pick a process and randomly select its state. Note, may have no observable effect if fault state is the same as correct state • all processes are faulty – arbitrary initial state: classic stabilization

  7. Experiment Setup • 100 processes • avg. tree height 21.64.9 • avg. number of leaves 37.53.1 • faults varied from one to 100 • ran 1,000 experiments for each fault number

  8. Metrics • stabilization time – number of execution steps for algorithm to achieve legitimate state (a single wave) • number of actions until stabilization* • overhead – number of action executions outside the propagation of correct wave (wait time for interleaving semantics [1]) • longest causality chain* – - actions are causally related if executed on same or neighbor process of actions* • scale – number of processes in the system __ * metrics were not included in published proceedings

  9. Stabilization Time

  10. Overhead

  11. Longest Causality Chain

  12. Scale • interleaving semanitcs • varied the system size from 100 to 1000 processes • fixed % of faults (100% is arbitrary state, classic stabilization)

  13. Analysis • simulation results present a detailed picture of algorithm behavior • notes • effort (overhead, actions, time) rises then diminishes with fault extent. In legitimate state single fault may launch spurious wave in opposite direction. Stabilization proportional to system size. Further faults tend to break up this wave and accelerate stabilization • parallel execution semantics (synchronous, power-set) result in greater overhead

  14. Future Research & Conclusion • the study is not exhaustive: the fault location affects the system differently. We believe that the fault closer to the root has a greater ability to perturb the system state • engagement with practice provides feedback for stabilization research: designers are induced to consider and address the problems of practical import • in our case – space fault spurious “counter-wave” was wholly unexpected – may need algorithmic measures to handle it

  15. Thank You Questions?

More Related