1 / 25

Design of Self-Managing Dependable Systems with UML and Fault Tolerance Patterns

Design of Self-Managing Dependable Systems with UML and Fault Tolerance Patterns. Matthias Tichy , Daniela Schilling, Holger Giese. Application Example - RailCab. Application Example - RailCab. Vision: Combine individual transport cost-effectiveness of public transport

calvin
Download Presentation

Design of Self-Managing Dependable Systems with UML and Fault Tolerance Patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design of Self-Managing Dependable Systems with UML and Fault Tolerance Patterns Matthias Tichy, Daniela Schilling, Holger Giese

  2. Application Example - RailCab

  3. Application Example - RailCab • Vision: Combine • individual transport • cost-effectiveness of public transport • Autonomously operating shuttles using linear drive (maglev train) • Build contact-free convoys for energy savings • Information about the exact position necessary • Computation based on the stator waves

  4. Example service structure: Problem: position calculation service must be highlyreliable Motivation :StatorWaveSensor :PositionCalculation :ConvoyController

  5. Problem: position calculation service must be highlyreliable Solution: self-healing, fault-tolerant software Application of software fault tolerance patterns, which capture: Abstract service structure of fault tolerance techniques Deployment restrictions (explicitly taking fault tolerance into account) Automatic deployment Initial deployment Self-healing by deployment reconfiguration during runtime Contents

  6. Fault Tolerance Patterns • Capture the abstract structure of a well known fault tolerance technique • Example: Triple Modular Redundancy (TMR) Triple Modular Redundancy :Service1 :Provider :Multiplier :Service2 :Voter :User :Service3

  7. Fault Tolerance Patterns - Deployment Arthur • Naïve, unrestricted deployment may yield unwanted results • Deployment must respect restrictions imposed by the fault tolerance technique (redundancy, heterogeneity) • Instead of manual deployment, enrich fault tolerance patterns by deployment restrictions <<deploy>> <<deploy>> <<deploy>> <<Service3>> :PositionCalculation <<Service1>> :PositionCalculation <<Service2>> :PositionCalculation

  8. Deployment restrictions for the TMR pattern: Deployment restrictions Avoid single-point-of-failure of voter / multiplier -> Deploy voter and user to same node (if the user fails, the failure of the voter is no problem) Node1: Node2: :Provider :Multiplier :Voter :User Avoid crash failures -> Deploy redundant services to distinct nodes :Service1 :Service2 :Service3 Node3: Node4: Node5: Heterogeneous hardware platform -> require different CPU { Node3.CPU  Node4.CPU  Node4.CPU  Node5.CPU  Node3.CPU  Node 5.CPU }

  9. Fault Tolerance Patterns :StatorWaveSensor :PositionCalculation :ConvoyController Application of TMR pattern <<Service1>> pc1:Position Calculation <<Provider>> sen: StatorWaveSensor <<Multiplier>> mult: Multiplier <<Service2>> pc2:Position Calculation <<Voter>> vot:Voter <<User>> cc:Convoy Controller <<Service3>> pc3:Position Calculation

  10. Use a standard constraint solver <<Service1>> pc1:Position Calculation <<Provider>> sen:Sensor <<Multiplier>> mult: Multiplier <<Service2>> pc2:Position Calculation <<Voter>> vot:Voter <<User>> cc:Convoy <<Service3>> pc3:Position Calculation Avalon Uther Gorlois Taliesin Gareth Arthur Compute a correct / reliable deployment Deployment

  11. Mapping to a standard constraint solver Compute a correct / reliable deployment Services (i) 1: Service pc1 is deployed to node Gorlois Nodes (j) • Variables xi,jє{0,1} • Constraint: (each service is deployed to exactly one node)  i  Services:∑xi,j = 1 j

  12. Restriction: Services must be executed on same node Graphically: Constraint: Compute a correct / reliable deployment <<deploys>> <<deploys>> <<Voter>> vot:Voter <<User>> cc: ConvoyController jNodes: (xvot,j+xcc,j) =2 or (xvot,j+xcc,j) =0

  13. Initial deployment: Graphically: Compute a correct / reliable deployment pc1 pc2 pc3 Avalon Gareth Taliesin sen mul vot cc Gorlois Uther Arthur

  14. Due to TMR, one crashed redundant service is tolerable Second crash is not tolerable Therefore: Self-heal by restarting the crashed service on a working node. But on which node? Compute it timely! Repair Gorlois <<Service1>> pc1:PositionCalculation <<deploys>> Uther <<Service2>> pc2:PositionCalculation <<deploys>> Arthur <<Service3>> pc3:PositionCalculation <<deploys>>

  15. Solve restricted deployment problem Repair Only consider changed values Gorlois <<deploys>> Crash Failure <<Service1>> pc1:PositionCalculation • Fix variables xi,jof unaffected service/node combinations • Free variables xi,jfor affected service/node combinations Service migration should be avoided!

  16. No solution found -> relax the problem First round  Service migration necessary Second round  Third round  Repair • Fixed variables • Free variables

  17. Current work Tool Support

  18. Fault tolerance patterns capture fault tolerance techniques Contain: Structure + Deployment restrictions Graphical specification Easy application Automatic deployment Initial deployment Self-healing by deployment reconfiguration during runtime Future Work Heterogeneous service restrictions in pattern application Synthesize behavior of voter and multiplier services Use fault tolerance application knowledge for automatic fault tree analysis Conclusions / Future Work .de www.

  19. Questions? .de www.

  20. Restriction: Services must be deployed on distinct nodes Graphically: Constraint: Compute a correct / reliable deployment <<Service1>> pc1: PositionCalculation <<deploys>> <<Service2>> pc2: PositionCalculation <<deploys>>  j  Nodes: (xpc1,j+xpc2,j) <=1

  21. Restriction: CPU of nodes for redundant service must be distinct Constraint: Compute a correct / reliable deployment xs1,j2 xs2,j2  xs3,j3  j1.cpu  j2.cpu  j3.cpu

  22. Multiple applications of TMR Transform to a multiple stage arrangement with tripled voter and multiplier Deployment restrictions are transformed too Multistage <<Multiplier>> :PCMultiplier <<Voter>> :Voter <<User>> <<Service1>> cc:Convoy <<Service1>> <<Provider>> pc1:Position Calculation <<Multiplier>> :PCMultiplier <<Voter>> :Voter <<Provider>> sen:Sensor <<Multiplier>> mult: Multiplier <<Service2>> <<Provider>> pc2:Position Calculation <<User>> <<Service2>> cc:Convoy <<Service3>> <<Provider>> pc3:Position Calculation <<Multiplier>> :PCMultiplier <<Voter>> :Voter <<User>> <<Service3>> cc:Convoy

  23. Already working: Graphical specification of fault tolerance patterns and deployment restrictions Automatic mapping to ILOG solver software Next: Runtime support Tool Support

  24. System without fault-tolerance, node crash (1) fault tolerance pattern Introduce redundancy (TMR) but deploy two redundant service on one node, crash of that node Better deploy redundant services to distinct nodes (2) deployment restrictions for fault tolerance pattern Example with distinct nodes, crash failure, everything ok (3) which nodes Now self-heal the system by restarting a crashed failure on a new node (4) which node Motivation

  25. Goals: Easy application of fault tolerance techniques Deployment issues taken into account Easy deployment specification Reusable deployment specification Motivation

More Related