1 / 19

Today: Distributed Coordination

Today: Distributed Coordination . Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection client-server), Global Names File Caching: In memory, In local disk Cache Update Policies : Write Back, Write through

bwillingham
Download Presentation

Today: Distributed Coordination

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Today: Distributed Coordination • Previous class: • Distributed File Systems • Issues: • Naming Strategies: Absolute Names, Mount Points (logical connection client-server), Global Names • File Caching: In memory, In local disk • Cache Update Policies : Write Back, Write through • Case study: Sun Microsystems NFS • Today: distributed coordination CS377

  2. What is distributed coordination? • In previous lectures we discussed various mechanisms to synchronize actions of processes in one machine • Mutual exclusion: Semaphores, locks, monitors • Ways of dealing with deadlocks: • ignoring it, • detecting it (let deadlock occur, detect them, and try to recover), • prevention (statically make deadlock structurally impossible), • Avoidance (avoid deadlock by allocating resources carefully) • These mechanisms have been centralized • Distributed coordination can be seen as generalization of these to distributed systems. CS377

  3. Event Ordering • Being able to order events is important to synchronization , e.g., we need to be able to specify that a resource can only be used after it has been granted. • In a centralized system, it is possible to determine order of events • This because all processes share common clock and memory • In a distributed system there is no common clock • It is therefore sometimes impossible to tell which of two events occurred first. CS377

  4. Event order: The Happened-Before Relation • Happened-Before denoted with arrow, e.g. A->B • If A and B are events in the same process, and A was executed before B, then A-> B • If A is the event sending a message and B is receiving a message, then A->B • If A->B and B->C, then A->C • If two events A and B are not related with -> relation then these events were executed concurrently • We don’t know which of these two events happened first CS377

  5. Example:Space-time diagram three distributed processes p3 r3 q3 p2 q2 r2 p1 q1 r1 p0 q0 r0 CS377

  6. Example cont. • Ordered events: • Are p0->q1; r0->q3; q2->r3; q0->p3 • And also p0->q3 (as p0->q1 AND q1->q3)… • Concurrent events: • q0 and p2 • r0 and q2 • p1 and q2 • Since neither affects the other it is NOT important to know CS377

  7. Implementation of Event Ordering • We would need either a COMMON CLOCK or PERFECTLY SYNCHRONIZED CLOCKS to determine event ordering in distributed systems • Not available/possible unfortunately! • How can we define the happened-before relationship WITHOUT physical clocks in distributed systems? CS377

  8. Implementation of Event Ordering • We define a logical clock, LCi, for each process Pi. • We associate a timestamp with each event • We advance the logical clocks when sending messages to account for slower logical clocks, i.e., if A send to B and B’s clock is less that A’s timestamp, we advance LC(B) to LC(A) + 1; • Now we can meet global-ordering requirement: if A->B then A’s timestamp < B’s timestamp. CS377

  9. Mutual Exclusion • How can we provide mutual exclusion across distributed processes? • 1. Centralized approach • We have one of the processes as coordinator • To enter a critical section each process sends Request and waits for a Reply message. If there is a process in the critical section the coordinator queues the request. • To leave the critical section we must send a Release message CS377

  10. Centralized approach for mutual exclusion • Advantages: • Relatively small overhead • Ensures mutual exclusion • If scheduling is fair no starvation occurs • Disadvantages: • Coordinator can fail • A new coordinator must be ELECTED • Once the new coordinator is elected it must poll all the processes to reconstruct the request queue. CS377

  11. Fully Distributed approach for mutual exclusion • Far more complicated solution • When a process Pi wants to enter its critical section, it generates a new timestamp TS, and sends a message Request(Pi,TS) to all processes. • A process can enter the critical section if receives Reply messages from all other processes. • Process Pj may not reply directly • Because is already in its critical section • Because it wants to enter its critical section, it checks TS and if his is smaller, the Reply is deferred CS377

  12. Fully Distributed Approach • Advantages: • mutual exclusion ensured • Starvation free (scheduled based on Timestamp) • Deadlock free • Disadvantages • All processes must know each other • If one process fails system collapses. Need continuous monitoring of the state of all processes to detect when one process fails. • Suitable for small number of processes CS377

  13. Token-Passing approach to mutual exclusion • A token (is a special type of message) circulates among all processes • Processes logically organized in a ring • If a process does not need to enter a critical section it passes the token to its neighbor • Advantage: in highly loaded system only one message may be enough, starvation free … • Disadvantage: if a process fails a new logical ring must be established, in system with low contention (no process wants to enter its critical section) the amount of messages per a critical section entry can be very large. CS377

  14. Deadlock handlingwith deadlock prevention • Deadlock avoidance not practical- require information about resource usage ahead of time that is rarely available. • Deadlock prevention • Can use the local algorithms with modifications • For example, we can use the resource-ordering (ensuring that resources are accessed in order) technique but first we need to define a global ordering among resources. • New techniques are using time-stamp ordering: • The wait-die scheme • Non-preemptive technique • If TS of Pi is smaller than TS of Pj, the resource Pi is requesting is hold by Pj, then Pi can wait for resource. Otherwise Pi must be rolled back (restarted). • The wound-wait scheme • Preemptive • The opposite of wait-die: Pi waits if its TS is larger than Pj’s, otherwise Pj is rolled back and the resource is preempted from Pj. CS377

  15. Deadlock handling withdeadlock detection • The deadlock-prevention may preempt resources even if no deadlock has occurred! • Deadlock detection is based on so called wait-for graphs • A wait-for graph shows resource allocation state • A cycle in the wait-for graph represents deadlock P4 P1 P2 P2 P5 P3 P3 Site B Site A CS377

  16. Global wait-for graphs • To show that there is NO DEADLOCK it is not enough to show that there is no cycle locally • We need to construct the global wait-for graph • It is the union of all local graphs. P1 P4 P2 P5 P3 CS377

  17. How to construct this global wait-for graph? • Centralized approach: • The graph is maintained in ONE process: the deadlock-detection coordinator • Since there is communication delay in the system we have two types of graphs: • Real wait-for graph // real but unknown state of the system • Constructed wait-for graph // approximation generated by the coordinator during the execution of its algorithm • When is the wait-for graph constructed? • 1.Whenever a new local edge inserted/removed a message is sent • 2. Periodically maintained • 3. Whenever the coordinator invokes the cycle-detector algorithm • What happens if a cycle is detected? • The coordinator selects a victim and notifies all processes CS377

  18. Centralized approach deadlock detection • False cycles may exist in the constructed global wait-for graph (because messages arrive in some order and delays contribute to edges added that form cycles; if a removed edge message arrives after another add edge message) • There is a centralized deadlock detection algorithm based on Option 3 that guarantees that it detects all deadlocks and no false deadlocks are detected. CS377

  19. Summary • Event ordering in distributed systems • Various approaches for Mutual Exclusion in distributed systems • Centralized approach • Toke based approach • Fully distributed • Deadlock prevention and detection • Global wait-for graph CS377

More Related