1 / 63

A Sqrt(N) Algorithm for Mutual Exclusion in Decentralized Systems

A Sqrt(N) Algorithm for Mutual Exclusion in Decentralized Systems. M. Maekawa University of Tokyo. The Problem of Mutual Exclusion. Exclusive access to a shared resource is sometimes essential for consistency.

urbano
Download Presentation

A Sqrt(N) Algorithm for Mutual Exclusion in Decentralized Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Sqrt(N) Algorithm for Mutual Exclusion in Decentralized Systems M. Maekawa University of Tokyo

  2. The Problem of Mutual Exclusion • Exclusive access to a shared resource is sometimes essential for consistency. • The problem of mutual exclusion involves granting exclusive access to shared resources whenever it is desired by a requestor.

  3. Previous Work • Ricart and Agrawala • O(N) messages. • Each node communicates with every other node. • Thomas • Majority voting, still O(N) messages • Gifford and Skeen • Majority voting with a non-uniform distribution of votes; Centralized

  4. Introduction • Maekawa’s algorithm uses c*Sqrt*(N) messages to obtain Mutual Exclusion. • c is a constant between 3 and 5 • Symmetric • Fully parallel operation

  5. Network assumptions • Error-free • FIFO channels [Messages between two nodes are delivered in the order sent]

  6. Decentralization • Two criteria for decentralized Mutual Exclusion • Equal Responsibility: Each node in the network bears an equal amount of responsibility to control Mutual Exclusion. • Equal Work: Each node must perform an equal amount of work to obtain Mutual Exclusion.

  7. Intuition behind Voting Sets • Any pair of mutual exclusion requests must be arbitrated and one of the requesting nodes may be given access. • The system comprises entirely of identical nodes which must share the responsibility of mutual exclusion [Decentralization] • Thus, any pair of two requests must reach a certain common node. • This implies that the voting sets of any two nodes i and j, given by Si and Sj, must have a non-empty intersection.

  8. Voting Set Rules

  9. Voting Set Rules Every pair of Voting Sets should have at least one shared node. Each nodes Voting Set should contain the node itself.

  10. Voting Set Rule Details All Voting Sets should be of the same size. This ensures that each node does an equal amount of work for obtaining mutual exclusion. Each node should appear a constant number of times in all the Voting Sets. This ensures that each node is equally responsible for mutual exclusion.

  11. Optimal Voting Set Size • The general idea is to represent the maximum number of Voting Sets [The size of the set of Voting Sets], in terms of D, K guided by the established set of rules. This evaluates to: • (D-1)K + 1 • This should be equal to the number of nodes, N, since we do not want fewer or more voting sets. • D is the degree of duplication of nodes and KN is the number of members such that N = KN/D. Thus D=K.

  12. More Math The problem of finding a set of Si’s which satisfy all rules exactly is equivalent to finding a finite projective plane of N points. The finite projective plane result implies that the set of Si’s will comply with all the rules IFF (K-1) is a power of a prime. For other cases some of the rules can be relaxed. In general:

  13. Algorithm Intuition • If node i can lock all members of its Voting Set Si, then no other node can capture all its members since the intersection of its Voting Set with that of i’s will have at least one node. • If a node fails to capture all its members, it waits till all of them are freed to lock them. • To prevent deadlocks, nodes get a priority based on the timestamp of their request.

  14. Messages • REQUEST • The message sent by a node to request mutual exclusion • REQUEST messages are time-stamped and earlier ones get higher priority. • INQURE • The INQUIRE message is sent to a node i that has requested a node j if j receives another request that predates that of i. • The purpose of the INQUIRE message is to query node j if it can indeed lock all its members. It is only sent once. • RELINQUISH • A reply to INQUIRE if the originating node cannot get all it members. • RELEASE • The message sent by a node after it has completed it critical section • LOCKED • The message sent from a member node to a requesting node if it is not currently locked by another request. • FAILED • The message sent from a member node to a requestor when it is currently locked by a higher priority request.

  15. Example (1)

  16. Example (2)

  17. Example (3)

  18. Example (4)

  19. Example (5)

  20. Proof of Mutual Exclusion • Paper contains a proof. • Essentially as long as the network assumptions hold and the algorithm observes it specification, mutual exclusion is guaranteed.

  21. Deadlocks Deadlocks are eliminated in Maekawa’s algorithm by attacking the circular wait condition For any cycle, there must be one node in the cycle whose REQUEST timestamp is preceded by both of its adjacent nodes in the circular wait. The removal of such a node breaks the circular wait condition for deadlocks to occur.

  22. Starvation • Starvation is a state of no progress. • For a node i in Maekawas algorithm, starvation would occur if i’s REQUEST are continuously blocked by preceding REQUEST messages at various members of Si. • This is however impossible because there can be at most (K-1) preceding outstanding requests for any request by a node and therefore, in finite time, i’s request will be accomodated.

  23. Message Traffic – Light Demand • Contention is rare • For an instance of mutual exclusion • (K-1) REQUEST messages • (K-1) LOCKED messages • (K-1) RELEASE messages

  24. Message Traffic – Heavy Demand • At most (K-1) messages for each of {REQUEST, INQUIRE, FAILED, RELEASED, RELINQUISH} • Thus, a maximum of 5*(K-1) messages.

  25. Node Failure • Algorithm assumes that failures can be detected by other nodes and failed nodes are removed from the system. • A simple approach to deal with failure is to allow another node to take over the responsibilities of the failed node.

  26. Comparison

  27. Variations • A simplified version of the algorithm can achieve mutual exclusion in 2 Sqrt(N) messages if greater delays are tolerated. • This entails sending REQUEST messages one-by-one to Voting Set members only if all REQUESTs so far have been LOCKED, in cyclic order. • An additional Sqrt(N) messages are required to release the mutual exclusion.

  28. Critique + Message complexity is O(Sqrt(N)) which is better that Ricart and Agrawala for a decentralized protocol. + Symmetry: The responsibility on nodes to control mutual exclusion and the work needed to attain mutual exclusion are balanced. -- Does not address the problems associated with node churn specifically. Dynamic adjustments made to Voting Sets with large churn will lead to rapid degeneration from the ideal Maekawa setup depending on policies for removal and addition of nodes.

  29. A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-Peer Systems Authors: Shi-Ding Lin (Microsoft Research Asia) Zheng Zhang (Microsoft Research Asia) Qiao Lian (Tsinghua University) Ming Chen (Tsinghua University)

  30. Motivation • Emerging p2p applications (such as the Grid) built on top of DHTs introduce several new challenges, when it comes to resource sharing • An important challenge that needs to be tackled in this context is providing mutual exclusion • Techniques used to provide mutual exclusion in traditional distributed systems are not completely applicable to p2p systems • Enforcing concurrency using stable transaction servers is out of question

  31. Quorum Consensus: Background • A quorum is a subgroup of replica managers whose size gives it the right to carry out operations • Quorum consensus is a replication scheme that provides all the benefits of replication, along with being able to handle network partitions

  32. Introduction • This paper proposes the Sigma protocol which is implemented inside a dynamic p2p DHT • It uses queuing and cooperation between clients and replicas, to enforce a quorum consensus scheme • Demonstrated the scalability of this protocol, resilience to network latency variance, and fault-tolerance

  33. Challenges • The open and dynamic nature of a wide-area p2p environment • Random resets, meaning the logical replica crashes and is replaced by one of its neighbors in the DHT

  34. System Model • The replicas are always available, but their internal states may be randomly reset • The number of clients is unpredictable and can be very large • Clients and replicas communicate via messages across unreliable channels. Messages can be replicated and lost

  35. System Model: Majority consensus in p2p DHT

  36. System Model: Assumptions • Clients are not malicious and fail-stop • Messages can not be forged • The typical lifetime of a DHT node is long enough so that a client can talk directly to the current logical replica

  37. Strawman Protocol • A simple, ALOHA-like protocol Replica 2 1 Critical Section 1 Replica 3 Client . . . 2 1 Replica 2

  38. Strawman Protocol: Discussion • Purpose behind this is to show that high variance of network latency between clients and replicas is responsible for the large performance degradation • Observed that robust mutual exclusion was entirely feasible, with m=24, n=32 (i.e. m/n = 0.75) • Chance of breaking exclusivity is ~ 10-40 • The performance, on the other hand, is very poor, as shown by the following graph…

  39. Strawman Protocol: Performance

  40. Strawman Protocol: Poor Performance Reasoning • Due to the variance of network latency between one client and each replica, requests will reach different replicas at different times: • Problem 1: Difficult to build a consistent view of competing clients • Problem 2: Greedy behavior of clients

  41. SIGMA Protocol • Addresses both problems faced by Strawman’s protocol • Problem 1: Solved by installing a queue at the replica, which is shuffled to reach a consistent view, in case of high contention • Problem 2: Solved by placing clients into an active waiting state

  42. SIGMA Protocol: Architecture

  43. SIGMA Protocol: YIELD Operation • This is an important performance optimizing operation and shows the collaborative side of this protocol • YIELD = RELEASE + REQUEST • By issuing the YIELD request, clients are collectively offering the replicas a chance to build a more consistent view and choose the right winner • This continues until a winner is chosen

  44. SIGMA Protocol: With failures (1 of 2) • After a crash, a replica may vote a second time. Solved by raising m/n ratio • Replicas grants permission to client using a renewable lease, to avoid the case when clients crash, while in CS • Message lost due to unreliable communication is treated the same as a client/replica crash

  45. SIGMA Protocol: With failures (2 of 2) • If queue is destroyed, the replica’s state can be rebuilt using the following informed backup mechanism: • Replica predicts expected waiting time and notifies client to retry after that time, using the empirical formula: Tw = TCS * (P + ½) • P = client’s position in the queue • TCS = average CS duration • Tw is always updated upon the reception of a retry

  46. SIGMA Protocol: Analysis • Service Policy – since client requests can take arbitrarily long to reach the replica, therefore FCFS is not guaranteed • Safety – guaranteed with high probability, since safety violation is almost negligible by choosing appropriate m/n ratio • Liveness – ensured through lease mechanism

  47. Experimental Results (1 of 3) • Sigma protocol is fully implemented and deployed in a distributed testbed • Assume a pool of infinity clients, each firing requests to enter CS based on Poisson distribution, with λ as the incoming request rate • After 5 minutes warm-up, tested 10 minutes during which throughput in terms of number of serviced requests per second is measured. Repeated for different incoming request rates

  48. Experimental Results (2 of 3)

  49. Experimental Results (3 of 3)

  50. Novel Contributions • Showed that high variation of network latency between clients and replicas causes performance degradation in a ALOHA-like strawman protocol • Demonstrated that a cooperative strategy between clients and replicas is necessary for overcoming the above problem and also to achieve scalability and robustness • Proposed an informed backoff mechanism to rebuild replica’s state (after a crash)

More Related