1 / 21

Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS

Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS. Development of Fault Tolerant Grid Applications Using Distributed B. Motivation. Grids have become widespread in organizations Handle large amount of information Manage computational resources

tammy
Download Presentation

Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS Development of Fault Tolerant Grid Applications Using Distributed B

  2. Motivation • Grids have become widespread in organizations • Handle large amount of information • Manage computational resources • Difficult to implement “correct” Grid applications • Formal methods useful in order to ensure correctness of specifications • Can be difficult to implement • The specification language should take into account the features of the underlying platform • Fault tolerance also important for correctness due to the nature of the grid environment

  3. Grids • Used for large-scale distributed systems • Scientific computing, e.g., in Physics and engineering • Business applications • Share information and computational resources over organizational boundaries • Loosely coupled systems • SOA • Client – Server architecture

  4. Client Host Grid service Client Grid service Grid services • Services in a grid environment that can be accessed by clients • Similar to remote objects in CORBA and RMI • Remote procedures used for communication

  5. Remote procedure call Client Grid service Notification Grid Services • Based on Web Services • XML • SOAP • WSDL • Extends Web services with • Potentially transient services containing state • Service data • Notifications • Globus Toolkit middleware

  6. The architecture of a grid application Host 1 Host 2 Grid service Client Globus T. Globus T. OS OS RPC Notification

  7. Faults • Remote procedure calls can fail in four different ways • The server grid service instance has crashed before the call • The network connection fails when calling a remote procedure • The server instance fails during the call • The network connection fails when returning the result • A notification can fail to arrive for two reasons • The sending grid service crashed before sending the notification • The network connection fails during sending • The client crashes when using a server grid service • The server grid service becomes an orphan

  8. Fault tolerance using GT • No support for advanced fault tolerance mechanisms such as replication or check-pointing • Exception is raised in the caller when a call to a remote procedure fails • Not easy to know what caused the exception to be raised • That a notifications is lost can be discovered with timers in the client • The most difficult error to handle is removal of orphan grid service instances

  9. Orphan control • Need to remove orphan grid service instances, since they waste resources • New remote procedure isAlive • Timer in both client and server grid service

  10. Orphan control • Need to remove orphan grid service instances, since they waste resources • New remote procedure isAlive • Timer in both client and server grid service • An exception is raised in the client when a call to isAlive fails • The server grid service is deleted when a timeout occurs in it

  11. Extension of the B Method Developed by J. R. Abrial Based on Action Systems by Back and Kurki-Suonio Related To B Action Systems Event B SYSTEMC VARIABLES x INVARIANT Inv_C INITIALISATION x := x0 EVENTS C_Evt1 = ANY u WHERE G1(u,x) THEN S1 END; C_Evt2= SELECT G2 THEN S2 END; END

  12. Formal development of Grid applications • We like to have a formal method suitable for developing fault tolerant grid applications • Difficult to create implementable specifications of grid applications in Event B • No grid communication mechanisms such as remote procedures and notifications • No fault tolerance mechanisms • Difficult to implement due to synchronization issues and atomicity of events • We need to extend Event B with constructs for • Specifying grid services • Remote procedure calls and notifications • Fault tolerance • Extensions should be introduced in a manner that simplifies implementation

  13. Distributed B • Provides two new types of B machines • GRIDSERVICE • GRID_REFINEMENT • Take into account grid specific features • Remote procedures • Notifications • Timeouts due to lost notifications • Exceptions due to failed calls to isAlive • Enables us to prove properties about the entire system • Are translated to ordinary B for verification • New constructs get their semantics from the translation • Automatic generation of proof obligations • Enable automatic or semi-automatic translation of the specification to a programming language

  14. Grid service machine • Abstract specification of a grid service • A grid service machine is a template that clients obtain instances of • Compare to Classes in OO • Remote procedures • Ordinary B procedures called from a client • Events • Executed independently of a client • Notifications • Sent when all events have become disabled Grid service Remote procedures: Proc(p) Events: J1T1 J2T2 Notifications: (J1  J2)  Q

  15. Grid refinement machine (1) • A client that uses grid service machine instances • Refines GRIDSERVICE, ordinary SYSTEM or REFINEMENT • Clause for enabling dynamic management of grid service machine instances • Instances are used as variables • When a failed instance is discovered it is marked as no longer in use and deleted from the application • Clause for refining remote procedures • Clause for refining events

  16. Grid refinement machine (2) • Special substitution used in events for making remote procedures calls • Enables the exceptions for failed calls to be handled • Special events that consists of two parts for handling notifications • First part enabled when a notification has been sent from a grid service • Second part enabled when a timeout occurs • Executed once for each notification/timeout • Special event for handling failed calls to isAlive • Enabled for each grid service instance in use. • Non-deterministically models failures of instances

  17. The behaviour of grid components Grid refinement Grid service Events: Remote procedures: Proc(p) G1S1 Events: G2S2 J1T1 G3S3 J2T2 Notifications: (J1  J2)  Q Notification handlers: NotifHandler

  18. Grid service machine GRIDSERVICE A VARIABLES y INVARIANT Inv_A INITIALISATION y := y0 REMOTE_PROCEDURES Proc(p) = PRE P(p) THEN T END EVENTS A_Evt1 = ANY u WHERE J1(u,y) THEN T1 END; A_Evt2 = SELECT J2 THEN T2 END; NOTIFICATIONS Notif = GUARANTEES Q END END

  19. Grid refinement machine (1) GRID_REFINEMENT C2 REFINES C1 REFERENCES A VARIABLES z,x,a_inst INVARIANT a_inst:A & Inv_C’ INITIALISATION x := x0 || z:=z0 || a_inst::A EVENTS C_Evt1 = SELECT G1’ THEN CALL a_inst.Proc(x) EXCEPTION SE END|| S1’ END; C_Evt2 = SELECT G2’ THEN S2’ END

  20. Grid refinement machine (2) NOTIFICATION_HANDLERS NotifHandler = NOTIFICATION Notif SOURCE v:A THEN S3 TIMEOUT ST END IS_ALIVE_HANDLERS IAHandler = SOURCE v:A THEN SIA END END

  21. Conclusions • Enables construction of correct fault tolerant grid applications • Automatic generation of proof obligations • Implementable architecture by construction • These Event B extensions can also use other middleware for distributed systems

More Related