1 / 62

Distributed Algorithms 2014 Igor Zarivach

A Distributed Algorithm for Minimum Weight Spanning Trees By Gallager , Humblet,Spira (GHS). Distributed Algorithms 2014 Igor Zarivach. Agenda. Introduction Review of spanning trees Description of GHS algorithm Algorithm execution on ring topology Complexity analysis.

penny
Download Presentation

Distributed Algorithms 2014 Igor Zarivach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Distributed Algorithm for Minimum Weight Spanning Trees By Gallager, Humblet,Spira (GHS) Distributed Algorithms 2014 Igor Zarivach

  2. Agenda • Introduction • Review of spanning trees • Description of GHS algorithm • Algorithm execution on ring topology • Complexity analysis

  3. Dijkstra prize in 2004 • An elegant and efficient distributed algorithm for finding a minimum spanning tree in an asynchronous network. • The problem is important, both theoretically and practically • Major algorithmic breakthrough on many fronts: • It solved the fundamental problem of symmetry breaking (or leader election) in the setting of a general graph • the algorithm has a surprisingly low message complexity for this important problem. • Techniques for multicasting, and for query and reply. • Beauty and elegance of the algorithm and its presentation. • An exceptional degree of asynchrony among the nodes. • Its structure is very intuitive and is easy to comprehend. • The algorithm is sufficiently complicated and interesting and is a challenge problem for formal verification methods. • Finding a proof is still very much an open problem in protocol verification and formal methods. • In summary, this paper is a genuine milestone in the area of asynchronous network algorithms; it has changed this field completely, in terms of both algorithmics and analysis techniques.

  4. Problem Statement • Given: The input graph G(V,E) is a connected undirected graph with N nodes, and E edges with distinct finite weight. • Need to find asynchronous distributed algorithm which determines the minimum spanning tree (MST) of the graph.

  5. Minimum (Weight) Spanning Tree 3 3 4 4 2 2 5 5 1 1 6 7 8 11 9 9 12 12 16 13 10 10 14 15

  6. Applications • Efficient broadcasting in networks • Establishing connectivity after nodes failure • Leader election

  7. Model • Communication • Asynchronous communication • Message passing • Messages can pass on an edge in both directions concurrently • Computation • processors represented by nodes • Assumption: Distinct weights on edges (will see why) • A processor knows a weight of edges connected to him • A processor knows its unique ID • One or more nodes can start the algorithm • Failures • Messages arrive in-order with no errors • No processor faults

  8. Definitions Fragment 2 root branch • Fragment: a subtree of MST • Branch: edge in MST, edge in fragment • Outgoing edge: edge between different fragments • Fragment’s MWOE: Minimum weight outgoing edge MWOE Fragment 1 Fragment 3 outgoing edge

  9. Two properties of MSTs Fragment F Fragment F’ • Property 1: Given a fragment F of a MST, let e be a minimum-weight outgoing edge of F, then joining e and its adjacent non-fragment node to Fyields another fragment F’of an MST. • Property 2: If all the edges of a connected graph have different weights, then the MST of the graph is unique. branch e MWOE e

  10. Algorithm GHS High Level Fragment F’ • Each fragment finds its MWOE asynchronously • When MWOE is found, the fragment attempts to combine with the fragment on the other end • We will show how and when to combine the fragments so the algorithm is correct and has good message complexity Fragment F

  11. Distinct weight edges Will the algorithm work for equal weight edges? • If edges are not distinct, but nodes have distinct identities , then Let , • We get distinct weight edges by , ties broken by s • If both edges and nodes are not distinct, there is no distributed algorithm to find MST • Any two edges are MST, but no way to break the symmetry 5 5 5 11

  12. Design - Fragment • Each fragment behaves asynchronously and independently • Initially, every fragment consists of a single node • Upon termination, there will be only one fragment • Each fragment will have a leader, which initiates fragment operations • Leader starts operation by broadcast • Every node replies to the leader by convergecast • The spanning tree is used for communication • When two fragments are merged, spanning tree is updated Fragment F’

  13. Design - Node • Node has a pointer to the next node in the path to the leader (father) • Node knows to which fragment it belongs Fragment F’

  14. Fragment F’ Design - Union Fragment F • Fragment finds its MWOE , • merges into neighbor fragment • becomes a subtree of the bigger tree • becomes a new root of • Nodes of update their father accordingly • sets its father to Fragment G

  15. Fragment F’ Problem 1 - Cycle Fragment F • and might merge concurrently over common MWOE • We get a cycle of length two Solution Fragment G • Both and become leaders of G • If we need one leader, can break symmetry by unique IDs

  16. Problem 2 – Unbalanced fragments • Choosing MWOE and updating father pointers is message complexity • Worst case: • Size of is nodes • Mergewith other fragments of size 1 • We get message complexity, but can get if sizes are equal Solution • Merge only smaller fragment to larger fragment • Update father pointers of smaller fragment • We need to estimate the size of the fragment!

  17. Fragment size estimation (Level) • It is hard to estimate the size of distributed tree • Use Level as the estimation for a tree size of at least nodes • Each fragment has a Level • Level 0 – only one node • Level k > 0 – at least nodes • Lemma: If Fragment F Level is then F has at least nodes • We want to guarantee the Lemma for all fragments • Level doesn’t represent the size correctly, Level L can have much more than nodes!!!

  18. Design - Union • The algorithm will guarantee that every fragment MWOE leads to such that • Level () Level () • Otherwise, if Level () Level () • will wait for to grow in Level • Waiting can lead to deadlocks!! • Smaller fragments never wait for larger,they are immediately absorbed into the larger neighbor F Level 3 F can’t find MWOE, and waits F’ Level 2

  19. Fragments union operations

  20. Design - Union • Define the “core” edge of the fragment’s tree: • “The edge along which the most recent Merge occurred.” • Lemma: changes once per Level • Fragment is Level 1 • Node 1 and 2 merged on common MWOE • Node 3 then absorbed • Fragment is Level 2 • Fragments and merged on common MWOE • Node 4 then absorbed

  21. Fragment Names and Leaders • We need to distinguish between fragments • Levels are not unique • Use for fragment identification • Fragment name: () • Leaders: two nodes adjacent to the

  22. Example Connect Connect Connect 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 1 1 1 3 3 3 4 4 4 2 2 2 Connect Connect Level 1 Level 1 Level 1 Level 1 Connect Test Test

  23. Example 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 1 1 1 1 3 3 3 3 4 4 4 4 2 2 2 2 Level 1 Level 1 Level 2 Level 1 Level 1 Level 1 Level 1 Connect Test Test Reject Accept Connect Connect

  24. Fragment Lifecycle

  25. Node state machine

  26. Specific messages: • Initiate: Broadcast from leader to find MWOE; contains fragment identity. • Report: Convergecast MWOE responses back to leader. • Test: Asks whether an edge is outgoing. • Accept/Reject: Answers to test. • Change-core: Sent from leader to endpoint of MWOE. • Connect: Sent across the MWOE, to connect fragments. • We say merge occurs when connect message has been sent both ways on the edge (2 nodes must have same level). • We say absorb occurs when connect message has been sent on the edge from a lower-level to a higher-level node.

  27. Description: Find MWOE – Level 0 • A single node fragment • The node is in state • The node awakens or receives a message • The node chooses its MWOE from all adjacent edges • Sends Connect(Level=0) over • Sets state to

  28. Description: Find MWOE – Level L Fragment F • Two Level (L-1) fragments merge over common MWOE • MWOE is a new • New Level L fragment has identity • Leaders broadcast Initiate() to all nodes • Initiate() contains identity, Level and state Find • Initiate() is passed to all (L-1) Level fragments waiting to connect to nodes in G • G nodes start Test-Accept-Reject protocol to find MWOE • When a node finds MWOE, Report is convergecasted to leaders MWOE e

  29. Description: Find MWOE – Level L (continued) Fragment F • Convergecast of Report(W) on fragment inbound edges • W() is defined as follows • is leaf: W is MWOE adjacent to or infinity • is internal node: W is min(MWOE()), is a node in subtreerooted at u) • Every G node remembers the edge leading to the MWOE in its subtree (best edge) • Best edges create a path from to the node • Leaders send Report messages on the core, one of them sends Change-core on • Every node on updates inbound edge to point to • sends Connect(L) over MWOE e

  30. Test-Accept-Reject Protocol • Bookkeeping: Each node keeps a list of incident edges in order of weight, classified as: • Branch(in the MST), • Rejected(leads to same fragment), or • Basic(not yet classified). • Node tests only Basicedges, sequentially in order of weight: • Sends Testmessage, with (core, Level); recipient compares. • If same (, Level), sendsReject(same fragment), and reclassifies edge as Rejected. • If (core, Level) pairs are unequal and Level()  Level() then sends Accept(different fragment). does not reclassify the edge. • If Level() < Level() then delays responding, until Level()  Level(). • This is the Waiting… which can lead to Deadlocks F’ F

  31. Merge • Suppose F and F have the same MWOE and Level • Level() Level() • Both and send Connect() over one in each direction • becomes a new of Level fragment • Nodes and send Initiate(,,) F F’

  32. Absorb • Suppose F absorbs into fragment F via an edge , while F is working on determining its MWOE. • Level() Level() • Node sends Connect() • Node immediately sends Initiate(,,) • : • If has not yet reported its local MWOE, send Initiate(Find) • Otherwise, send Initiate(Found). We will see why new fragment’ MWOE can’t be from . F’ F

  33. Correctness Given Properties 1 and 2, it is sufficient to verify: • MWOE is correctly chosen by every fragment • No deadlocks due to Waits

  34. MWOE Correctness (Async Absorb) Case: absorbs into after reported MWOE(). We need to prove that MWOE() is valid after Absorb. Claim 1: Reported MWOE() cannot be the edge (,). Proof: • Since MWOE() has already been reported, it must lead to a node with Level  Level(). • But the level of is still < level(), when the absorb occurs. • So MWOE() is a different edge, one whose weight < weight(,). Claim 2: MWOE for combined component is not outgoing from a node in . Proof: • (,) is the MWOE of , so there are no edges outgoing from with weight < weight(,). • So no edges outgoing from F with weight < already-reported MWOE(). • So MWOE of combined fragment isn’t outgoing from F. F’ F

  35. Liveness Fragment Digraph 2 Lemma: After any finite sequence of merges and absorbs, either the forest consists of one tree (so we’re done), or some merge or absorb is enabled Proof: • Consider the current “fragment digraph”: • Nodes represent fragments • Directed edges represent MWOEs • There is an edge with minimal weight not yet in a forest => Then there must be some pair , whose MWOEs point to each other. • We can combine fragments, using either merge or absorb: • If same level, merge, else absorb. • So, merging and absorbing are enough to proceed. • If one of , Waits, it Waits for smaller Level fragment only • But lowest Level fragment is NEVER blocked and can grow by Merge or Absorb 1 4 3 6 5

  36. The Algorithm (As Executed at Each Node)

  37. The Algorithm (As Executed at Each Node)

  38. The Algorithm (As Executed at Each Node)

  39. The Algorithm (As Executed at Each Node)

  40. Simulation – Ring 3 • Communication • Odd link – 1 cycle • Even link – 2 cycles 1 2 1 2 3

  41. Initialization Events Code 3 1 2 1 2 3 State Network

  42. Step 1 Events Code 3 1 2 1 2 3 State Network

  43. Step 2 Events Code 3 1 2 1 2 3 State Network

  44. Step 3 Events Code 3 1 2 1 2 3 State Network

  45. Step 4 Events Code 3 1 2 1 2 3 State Network

  46. Step 5 Events Code 3 1 2 1 2 3 State Network

  47. Step 6 Events Code 3 1 2 1 2 3 State Network

  48. Step 7 Events Code 3 1 2 1 2 3 State Network

  49. Step 8 Events Code 3 1 2 1 2 3 State Network

  50. Step 9 Events Code 3 1 2 1 2 3 State Network

More Related