1 / 28

Building Fault-Tolerant Consistency Protocols for an Adaptive Grid Data-Sharing Service

Building Fault-Tolerant Consistency Protocols for an Adaptive Grid Data-Sharing Service. Gabriel Antoniu, Jean-François Deverge, Sébastien Monnet PARIS Research Group IRISA/INRIA Rennes, France. Workshop on Adaptive Grid Middleware - AGridM 2004 Septembrer 2004, Juan-Les-Pins.

bracha
Download Presentation

Building Fault-Tolerant Consistency Protocols for an Adaptive Grid Data-Sharing Service

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Fault-Tolerant Consistency Protocols for an Adaptive Grid Data-Sharing Service Gabriel Antoniu, Jean-François Deverge, Sébastien Monnet PARIS Research Group IRISA/INRIA Rennes, France Workshop on Adaptive Grid Middleware - AGridM 2004 Septembrer 2004, Juan-Les-Pins

  2. Context: Grid Computing • Target architecture: cluster federations (e.g. GRID 5000) • Target applications: distributed numerical simulations (e.g code coupling) • Problem: the right approach for data sharing? Solid mechanics Optics Dynamics Satellite design Thermodynamics

  3. Current Approaches: Explicit Data Management • Explicit data localization and transfer • GridFTP [ANL], MPICH-G2 [ANL] • Security, parallel transfer • Internet Backplane Protocol [UTK] • Limitations • Application complexity at large-scale • No consistency guarantees for replicated data

  4. Migration ? Replication ? Node 0 Node 1 Handling Consistency:Distributed Shared Memory Systems • Features: • Uniform access to data via a global identifier • Transparent data localization and transfer • Consistency models and protocols • But: • Small-scale, static architectures • Challenge on a grid architecture: • Integrate new hypotheses ! • Scalability • Dynamic nature • Fault tolerance

  5. Case Study:Building a Fault-Tolerant Consistency Protocol • Starting point: a home-based protocol for entry consistency • Relaxed consistency model • Explicit association of data to locks • MRSW: Multiple Reader Single Writer • acquire(L) • acquireRead(L) • Implemented by a home-based protocol Home Home node Client

  6. Cluster A Cluster B acquire acquire lock lock read x read x data x data x release release Home Based Protocol Client B Client A Home acquire w(x) lock read x data x w(x) release

  7. Going Large Scale: a Hierarchical Consistency Protocol • Inspired by CLRC[LIP6, Paris] and H2BRC[IRISA, Rennes] Global home Local home Local home Client Local home

  8. Problem: Critical Entities May Crash • How to support home crashes on a grid infrastructure ? Global home Local home Local home Client Local home

  9. Idea: Use Fault-Tolerant Components • Replicate critical entities on a group of nodes • Group of nodes managed using the group membership abstraction • Rely on atomic multicast • Example architecture: A. Schiper[EPFL] Fault tolerance Adapter Group communication and group membership Atomic multicast Consensus Failure detector Communications

  10. Approach: Decoupled Design • Consistency protocol layer and fault-tolerance layer are separated • Interaction defined by a junction layer Consistency Junction layer Fault tolerance

  11. Consistency Protocol (CP) Dynamic CP Configuration Group Self-organization Fault-tolerant Building Blocks Consistency/Fault-Tolerance Interaction • Critical consistency protocol entities implemented as fault-tolerant node groups • Group management using traditional group membership and group communication protocols • Junction layer handles • Group self-organization • Configuration of new group members D D D A B A B A B A B C

  12. Global home Local home Local home Local home Replicate Critical Entities Using Fault-Tolerant Components • Rely on replication techniques and group communication protocols used in fault-tolerant distributed systems Client

  13. GDG : Global Data Group LDG : Local Data Group GDG LDG LDG LDG Replicate Critical Entities Using Fault-Tolerant Components • Rely on replication techniques and group communication protocols used in fault-tolerant distributed systems Client

  14. The JuxMem Framework • DSM systems: consistency and transparent access • P2P systems: scalability and high dynamicity • Based on JXTA, P2P framework [Sun Microsystems] Juxmem group Data group Cluster group A Cluster group C Cluster group B Virtual architecture Physical architecture

  15. Implementation in JuxMem • Data group ≈ GDG + LDG Juxmem group GDG LDG LDG Cluster group A Cluster group C Cluster group B Virtual architecture Physical architecture

  16. Preliminary Evaluation • Experiments • Allocation cost depending on replication degree • Cost of the basic data access operations • read/update • Testbed: paraci cluster (IRISA) • Bi Pentium IV 2,4 Ghz, 1 Go de RAM, Ethernet 100 • Emulation of 6 clusters of 8 nodes Juxmem group Cluster group A Cluster group B Cluster group C Cluster group D Cluster group E Cluster group F

  17. Allocation Process • Discover n providers according to the specified replication degree • Send an allocation request to the n discovered providers • On each provider receiving an allocation request: • Instantiate the protocol layer and the fault-tolerant building blocs

  18. Preliminary Evaluation: Allocation Cost

  19. Cost of Basic Primitives: read/update

  20. Consistency Protocol (CP) Dynamic CP Configuration Proactive Group Membership Fault-tolerant Building Blocks Conclusion • Handling consistency of mutable, replicated data in a volatile environment • Experimental platform for studying the interaction fault-tolerance <-> consistency protocols

  21. Future Work (AGridM 2003) • Consistency protocols in a dynamic environment • Replication strategies for fault tolerance • Co-scheduling computation and data distribution • Integrate high-speed networks: Myrinet, SCI.

  22. Future Work (AGridM 2003) • Consistency protocols in a dynamic environment • Replication strategies for fault tolerance • Co-scheduling computation and data distribution • Integrate high-speed networks: Myrinet, SCI.

  23. Future Work (AGridM 2004) • Goal: build a Grid Data Service • Experiment various implementations of fault-tolerant building blocks (atomic multicast, failure detectors, …) • Parametrizable replication techniques • Experiment various consistency protocols with various replication techniques • Experiment with realistic grid applications at large scales • GDS (Grid Data Service) project of ACI MD: http://www.irisa.fr/GDS

  24. Transparent Data Localisation • DSM Systems • Small scales • No volatility support • Mutable data • P2P Systems • Large scales • Volatility support • Immutable data

  25. Goal • Data-sharing service for the grid • Uniform access to data via a global identifier • Transparent data localization and transfer • Consistency models and protocols • Volatility support • Hybrid system • DSM systems • P2P systems

  26. How to Address Fault-Tolerance and Data Consistency at the Same Time? • Both use replication • Two main approaches • Integrated design • Copies created for locality serve as backup and vice versa • Complex architecture and protocols • Decoupled design • Separate roles, cleaner architecture • Leverage existing consistency protocols and existing fault-tolerance replication techniques • Limited interaction via a well-specified interface

More Related