1 / 39

Network algorithms

Network algorithms. Presenter- Kurchi S ubhra H azra. Agenda. Basic Algorithms such as Leader Election Consensus in Distributed Systems Replication and Fault Tolerance in Distributed Systems GFS as an example of a Distributed System. Network Algorithms.

dyani
Download Presentation

Network algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network algorithms Presenter- KurchiSubhraHazra

  2. Agenda • Basic Algorithms such as Leader Election • Consensus in Distributed Systems • Replication and Fault Tolerance in Distributed Systems • GFS as an example of a Distributed System

  3. Network Algorithms • Distributed System is a collection of entities where • Each of them is autonomous, asynchronous and failure-prone • Communicating through unreliable channels • To perform some common function • Network algorithms enable such distributed systems to effectively perform these “common functions”

  4. Gobal State in Distributed Systems • We want to estimate a “consistent” state of a distributed system • Required for determining if the system is deadlocked, terminated and for debugging • Two approaches: • 1. Centralized- All processes and channels report to a central process • 2. Distributed – ChandyLamport Algorithm

  5. ChandyLamport Algorithm Based on Marker Messages M On receiving M over channel c: If state is not recorded: a) Record own state b) Start recording state of incoming channels c) Send Marker Messages to all outgoing channels Else a) Record state of c

  6. e11,2 e14 e13 M M e24 M M e21,2,3 M M e31 e32,3,4 1- P1 initiates snapshot: records its state (S1); sends Markers to P2 & P3; turns on recording for channels Ch21 and Ch31 2- P2 receives Marker over Ch12, records its state (S2), sets state(Ch12) = {} sends Marker to P1 & P3; turns on recording for channel Ch32 3- P1 receives Marker over Ch21, sets state(Ch21) = {a} 4- P3 receives Marker over Ch13, records its state (S3), sets state(Ch13) = {} sends Marker to P1 & P2; turns on recording for channel Ch23 5- P2 receives Marker over Ch32, sets state(Ch32) = {b} 6- P3 receives Marker over Ch23, sets state(Ch23) = {} 7- P1 receives Marker over Ch31, sets state(Ch31) = {} ChandyLamport Algorithm e10 e13 P1 a e23 P2 e20 b P3 e30 Taken from CS 425/UIUC/Fall 2009

  7. Leader Election • Suppose you want to -elect a master server out of n servers -elect a co-ordinator among different mobile systems Common Leader Election Algorithms -Ring Election -Bully Election Two requirements • Safety (Process with best attribute is elected) • Liveness (Election terminates)

  8. Ring Election • Processes organized in a ring • Send message clockwise to next process in a ring with its id and own attribute value • Next process checks the election message • if its attribute value is greater, it replaces its own process id with that in the message. • If the attribute value is less, it simply passes on the message • If the attribute value is equal it declares itself as the leader and passes on an “elected” message. What happens when a node fails?

  9. Ring Election - Example Taken from CS 425/UIUC/Fall 2009

  10. Ring Election - Example Taken from CS 425/UIUC/Fall 2009

  11. Bully Algorithm Best case and worst case scenarios Taken from CS 425/UIUC/Fall 2009

  12. Consensus • A set of n processes/systems attempt to “agree” on some information • Pi begins in undecided state and proposes value viєD • Pi‘s communicate by exchanging values • Pi sets its decision value di and enters decided state • Requirements: 1.Termination: Eventually all correct processes decide, i.e., each correct process sets its decision variable 2. Agreement: Decision value of all correct processes is the same 3. Integrity: If all correct processes proposed v, then any correct decided process has di= v

  13. 2 Phase Commit Protocol • Useful in distributed transactions to perform atomic commit • Atomic Commit: Set of distinct changes applied in a single operation • Suppose A transfers 300 $ from A’s account to B’s bank account. • A= A-300 • B=B+300 These operations should be guaranteed for consistency.

  14. 2 Phase Commit Protocol What happens if the co-ordinator and a participant fails after doCommit?

  15. Issue with 2PC CanCommit? Co- ordinator A B

  16. Issue with 2PC Yes Co- ordinator A B

  17. Issue with 2PC doCommit A crashes Co- ordinator A Co-ordinator Crashes B commits B A new co-ordinator cannot know whether A had committed.

  18. 3 Phase Commit Protocol (3PC) Use an additional stage

  19. 3PC Cont… commit ack ack canCommit preCommit Co-ordinator commit Cohort 1 commit Cohort 2 commit Cohort 3

  20. 3PC Cont… • Why is this better? • 2PC: execute transaction when everyone is willing to COMMIT it • 3PC: execute transaction when everyone knowsit will COMMIT (http://www.coralcdn.org/07wi-cs244b/notes/l4d.txt) • But 3PC is expensive • Timeouts triggered by slow machines

  21. Paxos Protocol • A consensus algorithm • Important Safety Conditions: • Only one value is chosen • Only a proposed value is chosen • Important Liveness Conditions: • Some proposed value is eventually chosen • Given a value is chosen, a process can learn the value eventually • Nodes behave as Proposer, Acceptor and Learners

  22. Paxos Protocol – Phase 1 Prepare message Select a number n for proposal of value v Acceptor Proposer Acknowledgement Acceptors respond back with the highest n it has seen Acceptor Acceptor Acceptor What about this acceptor? Majority of acceptors is enough

  23. Paxos Protocol – Phase 2 Proposer Majority of acceptors agree on proposal n with value v n Acceptor n n Acceptor Acceptor Acceptor

  24. Paxos Protocol – Phase 2 Acceptors accept Proposer Accept Majority of acceptors agree on proposal n with value v Acceptor Acceptor What if v is null? Acceptor Acceptor

  25. Paxos Protocol Cont… • What if arbitrary number of proposers are allowed? Round 1 n1 P Round 2 Acceptor n2 Q

  26. Paxos Protocol Cont… • What if arbitrary number of proposers are allowed? • To ensure progress, use distinguished proposer Round 1 P n3 Round 2 Acceptor Round 3 n4 Round 4 Q

  27. Paxos Protocol Contd… • Some issues: • How to choose proposer? • How do we ensure unique n ? • Expensive protocol • No primary if distinguished proposer used Originally used by Paxons to run their part-time parliament

  28. Replication • Replication is important for • Fault Tolerance • Load Balancing • Increased Availability Requirements: • Transparency • Consistency

  29. Failure in Distributed Systems • An important consideration in every design decision • Fault detectors should be : • Complete – should be able to detect a fault when it occurs • Accurate – Does not raise false positives

  30. Byzantine Faults • Arbitrary messages and transitions • Cause: e.g., software bugs, malicious attacks • Byzantine Agreement Problem: “Can a set of concurrent processes achieve coordination in spite of the faulty behavior of some of them?” • Concurrent processes could be replicas in distributed systems

  31. Practical Byzantine Fault Tolerance(PBFT) • Replication Algorithm that is able to tolerate faults. • Useful for software faults • Why “Practical”? -> since can be used in an asynchronous environment like the internet • Important Assumptions: • At most nodes can be faulty • All replicas start in the same state • Failures are independent – Practical?

  32. PBFT Cont.. request pre-prepare prepare reply commit C Execution after 2f+1 commits R1 R2 R3 R4 Client blocks and waits for f+1 replies After accepting 2f prepares C : Client R1: Primary replica

  33. PBFT Cont… • The algorithm provides • -> Safety • By guaranteeing linearizability. Pre-prepare and prepare ensures total order on messages • -> Liveness • By providing for view change, when the primary replica fails. Here, synchrony is assumed. • How do we know apriori the value of f?

  34. Google File System • Revisited traditional file system design 1. Component failures are a norm 2. Multi-GB Files are common 3. Files mutated by appending new data 4. Relaxed consistency model

  35. GFS Architecture Leader Election/ Replication Maintains metadata, namespace, chunk metadata etc

  36. GFS – Relaxed Consistency

  37. GFS – Design Issues Rational: Keep things simple Single Master Problems: Increasing volume of underlying storage -> Increase in metadata Clients not as fast as master server -> Master server became bottleneck Current: Multiple Masters per data center Ref: http://queue.acm.org/detail.cfm?id=1594206

  38. GFS Design Isuues • Replication of chunks • Replication across racks – default number is 3 • Allowing concurrent changes to the same file.-> In retrospect, they would rather have a single writer • Primary replica serializes mutation to chunks-They do not use any of the consensus protocols before applying mutations to the chunks. Ref: http://queue.acm.org/detail.cfm?id=1594206

  39. Thank You

More Related