1 / 65

UBI529

UBI529. 3. Distributed Graph Algorithms. Distributed Algorithms Models. Interprocess Communication method : accessing shared memory, point-to-point or broadcast messages, or remote procedure calls. • Timing model : synchronous or asynchronous models.

boyce
Download Presentation

UBI529

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UBI529 3. Distributed Graph Algorithms

  2. Distributed Algorithms Models • Interprocess Communication method: accessingshared memory, point-to-point or broadcastmessages, or remote procedure calls. • • Timing model: synchronous or asynchronousmodels. • • Failure models: reliable or faulty behavior;Byzantine failures (failed processor can behavearbitrarily).

  3. We assume • A distributed network—Modeled as a graph. Nodesare processors and edges are communication links. • • Nodes can communicatedirectly(only) with theirneighbors through the edges. • • Nodes haveuniqueprocessor identities. • • Synchronous model: Time is measured in rounds(time steps). • • One message (typically of size O(log n)) can besent through an edge in a time step. A node cansend messages simultaneously through all its edgesat once in a round. • • No failure of nodes or edges. No malicious nodes.

  4. 2.1Vertex and Tree Coloring • Vertex Coloring • Sequential Vertex Coloring Algorithms • Distributed Synchronous Vertex Coloring Algorithm • Distributed Tree Coloring Algorithms

  5. Preliminaries • Vertex Coloring Problem: Given undirected Graph G = (V,E). Assign a color cu to eachvertex u Є V such that if e = (v,w) ЄE, then cu ≠ cw Aim is to use the minimum number of colors. • Definition 2.1.1 : Given an undirected Graph,chromatic numberΧ(G)is the minimum number of colors to color it. A vertex k-coloring uses exactly k colors. If X(G) = k, G is k-colorable but not (k-1) colorable. • Calculating X(G) is NP-hard. 3-coloring decision is NP-complete. • Applications : • Assignment of radio frequencies : Colors represent frequencies, transmitters are the vertices. If two stations are neighbors when they interfere. • University course scheduling : Vertices are courses, students edges • Fast register allocation for computer programming : Vertices are variables, they are neigbors if they can be active at the same time.

  6. Sequential Algorithm for Vertex Coloring • Algorithm 2.1.1 : Sequential Vertex Coloring • Input : G with v1,v2, ..., vn • Output : Vertex Coloring f : VG -> {1,2,3,..} • 1. For i =1 to n do • 2. f(vi) := smallest color number that does not conflict by any of the other colored neighbors of vi • 3. Return Vertex Coloring f

  7. Vertex Coloring Algorithms • Definition 2.1.2: The number of neighbors of a vertex v is called the degree of v δ(v).The maximum degree vertex in a Graph G is called the the Graph degreeΔ(G) = Δ. • Theorem 2.1.1: The algorithm is correct and terminates in O(n)steps. The algorithm uses Δ +1 colors. • Proof: Correctness and termination are straight-forward. Since each node has at mostΔ neighbors, there is always at least one color free in the range {1, …, Δ+1}. • Remarks: • • For many graphs coloring can be done with much less than Δ +1 colors. • • This algorithm is not distributed; only one processor is active at a time. But: Useidea of Algorithm 1.4 to define “local” coloring subroutine 1.7

  8. Heuristic Vertex Coloring Algorithm : Largest Degree First • Idea : (Two observations)A vertex of a large degree is more difficult to color than a smaller degree vertex. Also, a vertex with more colored neighbors will be more difficult to color later • Algorithm 2.1.1 : Largest Degree First Algorithm • Input : G with v1,v2, ..., vn • Output : Vertex Coloring f : VG -> {1,2,3,..} • 1. While there are uncolored vertices of G • 2. Among the uncolored max. degree vertices • Choose vertex v with the max. Colored degree • 3. Assign smallest possible k to v : f(v) := k • 4. Return Vertex Coloring f • The coloring in the diagram is v3,v1,v2,v4,v8,v6,v7,v5 • Colored degree : # of different colors used to color neighbors of v

  9. Coloring Trees : A Distributed Algorithm • Lemma 2.1.1: X(Tree) <= 2. • Proof: If the distance of a node to the root is odd (even), color it 1 (0). An oddnode has only even neighbors and vice versa. • If we assume that each node knows its parent (root has no parent) and children in a tree, thisconstructive proof gives a very simple algorithm. • Algorithm 2.1.3 [Slow tree coloring]: • 1. Root sends color 0 to children. (Root is colored 0) • 2. When receiving a message x from parent, a node u picks color cu = 1-x, andsends cu to its children

  10. Distributed Tree Coloring • Remarks: • • With the proof of Lemma 2.1.1, the algorithm 2.13 is correct. • • The time complexity of the algorithm is the height of the tree. • • When the root is chosen randomly, this can be up to the diameter of the tree.

  11. 2.2Distributed Tree based Communication Algorithms • Broadcast • Convergecast • BFS Tree Construction

  12. Broadcast • Broadcasting means sending a message from a source node to all other • nodes of the network. • Two basic broadcasting approaches are flooding and • spanning tree-based broadcast. • Flooding: • A source node s wants to send a message to all • nodes in the network.s simply forwards the message over all its edges. • Any vertex v!= s, upon receiving the message for • the first time (over an edge e) forwards it on every • other edge. • Upon receiving the message again it does nothing.

  13. Broadcast • Definition 2.2.1 [Broadcast]: A broadcast operation is initiated by a single processor, the source.The source wants to send a message to all other nodes in the system. • Definition 2.2.2[Distance, Radius, Diameter]: • The distance between two nodes u, v in anundirected graph is the number of hops of a minimum path between u and v. • The radius of anode u in a graph is the maximum distance between u and any other node. The radius of agraph is the minimum radius of any node in the graph. • The diameter of a graph is themaximum distance between two arbitrary nodes.

  14. Broadcast • Theorem 2.2.1[Lower Bound]: The message complexity of a broadcast is at least n-1. Theradius of the graph is a lower bound for the time complexity. • Proof: Every node must receive the message. • Remarks: • • You can use a pre-computed spanning tree to do the broadcast with tight messagecomplexity. • • If the spanning tree is a breadth-first spanning tree (for a given source), then also thetime complexity is tight. • Definition 2.2.3: A graph (system/network) is clean if the nodes do not know thetopology of the graph. • Theorem 2.2.2[Clean Lower Bound]: For a clean network, the number of edges is a lowerbound for the broadcast message complexity. • Proof: If you do not try every edge, you might miss a whole part of the graph behind it.

  15. Flooding • Algorithm 2.2.1[Flooding]: The source sends the message to all neighbors. Each nodereceiving the message the first time forwards to all (other) neighbors. • Remarks: • • If node v receives the message first from node u, then node v calls node u “parent”.This parent relation defines a spanning tree T. If the flooding algorithm is executed ina synchronous system, then T is a breadth-first spanning tree (with respect to the root). • • More interestingly, also in asynchronous systems the flooding algorithm terminatesafter r time units, where r is the radius of the source. (But note that the constructedspanning tree needs not be breadth-first.)

  16. Flooding Analysis • Theorem : The message complexity of flooding is(|E|) and the time complexity is (D), where D isthe diameter of G. • Proof. The message complexity follows from the factthat each edge delivers the message at least once andat most twice (one in each direction). To show the time complexity, we use induction on t to show that aftert time units, the message has already reached every • vertex at a distance of t or less from the source

  17. Broadcast Over a Rooted Spanning Tree • Suppose processors already have information about a rooted spanning tree of the communication topology • tree: connected graph with no cycles • spanning tree: contains all processors • rooted: there is a unique root node • Implemented via parent and children local variables at each processor • indicate which incident channels lead to parent and children in the rooted spanning tree

  18. Broadcast Over a Rooted Spanning Tree: A Simple Algorithm • 1. root initially sends msg to its children • 2. when a node receives msg from its parent • sends msg to its children • terminates (sets a local boolean to true) • Synchronous model: • time is depth of the spanning tree, which is at most n - 1 • number of messages is n - 1, since one message is sent over each spanning tree edge • Asynchronous model: • same time and messages

  19. Tree Broadcast • Assume that a spanning tree has been constructed. • Theorem . For every n-vertex graph G with aspanning tree T rooted at r0, the message complexityof broadcast is n−1 and time complexity is depth(T). • A broadcast algorithm can be used to construct aspanning tree in G. • The message complexity of broadcast isasymptotically equivalent to the message complexityof spanning tree construction. • Using a breadth-first spanning tree, we get the • optimal message and time complexities for broadcast.

  20. Convergecast • Again, suppose a rooted spanning tree has already been computed by the processors • parent and children variables at each processor • Do the opposite of broadcast: • leaves send messages to their parents • non-leaves wait to get message from each child, then send combined info to parent

  21. f a b c d e c,f,h b,d f,h d e,g g h g h Convergecast solid arrows: parent-child relationships dotted lines: non-tree edges

  22. Finding a Spanning Tree Given a Root • a distinguished processor is known, to serve as the root • root sends M to all its neighbors • when non-root first gets M • set the sender as its parent • send "parent" msg to sender • send M to all other neighbors • when get M otherwise • send "reject" msg to sender • use "parent" and "reject" msgs to set children variables and know when to terminate

  23. c b b c a a d f f d e e g h g h Execution of Spanning Tree Alg. Both models: O(m) messages O(diam) time Asynchronous: not necessarily BFS tree Synchronous: always gives breadth-first search (BFS) tree

  24. 2.3Distributed Minimum Spanning Tree Algorithms

  25. Minimum Spanning Tree • Minimum spanning tree. Given a connected graph G = (V, E) with real-valued edge weights ce, an MST is a subset of the edges T  E such that T is a spanning tree whose sum of edge weights is minimized. • Cayley's Theorem. There are nn-2 spanning trees of Kn. 24 4 4 23 9 9 6 6 18 5 5 11 11 16 8 8 7 7 14 10 21 G = (V, E) T, eT ce = 50 can't solve by brute force

  26. Applications • MST is fundamental problem with diverse applications. • Network design • telephone, electrical, hydraulic, TV cable, computer, road • Approximation algorithms for NP-hard problems • traveling salesperson problem, Steiner tree • Indirect applications • max bottleneck paths • LDPC codes for error correction • image registration with Renyi entropy • learning salient features for real-time face verification • reducing data storage in sequencing amino acids in a protein • model locality of particle interactions in turbulent fluid flows • autoconfig protocol for Ethernet bridging to avoid cycles in a network • Cluster analysis.

  27. Greedy Algorithms • Kruskal's algorithm. Start with T = . Consider edges in ascending order of cost. Insert edge e in T unless doing so would create a cycle. • Reverse-Delete algorithm. Start with T = E. Consider edges in descending order of cost. Delete edge e from T unless doing so would disconnect T. • Prim's algorithm. Start with some root node s and greedily grow a tree T from s outward. At each step, add the cheapest edge e to T that has exactly one endpoint in T. • Remark. All three algorithms produce an MST.

  28. Greedy Algorithms • Simplifying assumption. All edge costs ce are distinct. • Cut property. Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST contains e. • Cycle property. Let C be any cycle, and let f be the max cost edge belonging to C. Then the MST does not contain f. C f S e e is in the MST f is not in the MST

  29. Cycles and Cuts • Cycle. Set of edges the form a-b, b-c, c-d, …, y-z, z-a. • Cutset. A cut is a subset of nodes S. The corresponding cutset D is the subset of edges with exactly one endpoint in S. 2 3 1 6 4 Cycle C = 1-2, 2-3, 3-4, 4-5, 5-6, 6-1 5 8 7 2 3 1 Cut S = { 4, 5, 8 } Cutset D = 5-6, 5-7, 3-4, 3-5, 7-8 6 4 5 8 7

  30. Cycle-Cut Intersection • Claim. A cycle and a cutset intersect in an even number of edges. • Pf. (by picture) 2 3 1 Cycle C = 1-2, 2-3, 3-4, 4-5, 5-6, 6-1Cutset D = 3-4, 3-5, 5-6, 5-7, 7-8 Intersection = 3-4, 5-6 6 4 5 8 7 C S V - S

  31. Greedy Algorithms • Simplifying assumption. All edge costs ce are distinct. • Cut property. Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST T* contains e. • Pf. (exchange argument) • Suppose e does not belong to T*, and let's see what happens. • Adding e to T* creates a cycle C in T*. • Edge e is both in the cycle C and in the cutset D corresponding to S  there exists another edge, say f, that is in both C and D. • T' = T*  {e} - {f} is also a spanning tree. • Since ce < cf, cost(T') < cost(T*). • This is a contradiction. ▪ f S e T*

  32. Greedy Algorithms • Simplifying assumption. All edge costs ce are distinct. • Cycle property. Let C be any cycle in G, and let f be the max cost edge belonging to C. Then the MST T* does not contain f. • Pf. (exchange argument) • Suppose f belongs to T*, and let's see what happens. • Deleting f from T* creates a cut S in T*. • Edge f is both in the cycle C and in the cutset D corresponding to S  there exists another edge, say e, that is in both C and D. • T' = T*  {e} - {f} is also a spanning tree. • Since ce < cf, cost(T') < cost(T*). • This is a contradiction. ▪ f S e T*

  33. Prim's Algorithm: Proof of Correctness • Prim's algorithm. [Jarník 1930, Dijkstra 1957, Prim 1959] • Initialize S = any node. • Apply cut property to S. • Add min cost edge in cutset corresponding to S to T, and add one new explored node u to S. S

  34. Implementation: Prim's Algorithm • Implementation. Use a priority queue ala Dijkstra. • Maintain set of explored nodes S. • For each unexplored node v, maintain attachment cost a[v] = cost of cheapest edge v to a node in S. • O(n2) with an array; O(m log n) with a binary heap. Prim(G, c) { foreach (v  V) a[v]  Initialize an empty priority queue Q foreach (v  V) insert v onto Q Initialize set of explored nodes S  while (Q is not empty) { u  delete min element from Q S  S  {u} foreach (edge e = (u, v) incident to u) if ((v  S) and (ce < a[v])) decrease priority a[v] to ce }

  35. Kruskal's Algorithm: Proof of Correctness • Kruskal's algorithm. [Kruskal, 1956] • Consider edges in ascending order of weight. • Case 1: If adding e to T creates a cycle, discard e according to cycle property. • Case 2: Otherwise, insert e = (u, v) into T according to cut property where S = set of nodes in u's connected component. S v e e u Case 2 Case 1

  36. Implementation: Kruskal's Algorithm • Implementation. Use the union-find data structure. • Build set T of edges in the MST. • Maintain set for each connected component. • O(m log n) for sorting and O(m  (m, n)) for union-find. m  n2  log m is O(log n) essentially a constant Kruskal(G, c) { Sort edges weights so that c1 c2 ...  cm. T  foreach (u  V) make a set containing singleton u for i = 1 to m (u,v) = ei if (u and v are in different sets) { T  T {ei} merge the sets containing u and v } return T } are u and v in different connected components? merge two components

  37. Chang-Robert’s algorithm {The root is known} Uses signals and acks, similar to the termination detection algorithm. Uses the same rule for sending acknowledgment. Distributed Spanning tree construction For a graph G=(V,E), a spanning tree is a maximally connected subgraph T=(V,E’), E’ E,such that if one more edge is added, then the subgraph is no more a tree. Used for broadcasting in a network. Question:What if the root is not designated?

  38. program probe-echo define N : integer (no. of neighbors) C, D : integer; initially parent :=i; C=0; D=0; {for the initiator} send probes to each neighbor; D:=no. of neighbors; do D!=0  echo -> D:=D-1 od {D=0 signals end} { for a non-initator process i>0} do parentparent=iC=0 -> C:=1; parent := sender; ifi is not a leaf -> send probes to non – parent neighbors; D:= no. of non-parent neighbors fi;  echo -> D:=D-1;  probe  sender != parent -> send echo to sender;  C=1  D=0 -> send echo to parent; C:=0; od Chang Roberts Spanning Tree Alg

  39. Many applications of exploring an unknown graph by a visitor (a token or mobile agent or a robot). The goal of traversal is to visit every node at least once, and return to the starting point. - How efficiently can this be done? - What is the guarantee that all nodes will be visited? - What is the guarantee that the algorithm will terminate? Graph traversal Consider web-crawlers, exploration of social networks, graph layouts for visualization or drawing etc.

  40. Rule 1. Send the token towards each neighbor exactly once. Rule 2. If rule 1 is not applicable, then send the token to the parent. Graph traversal and Spanning Tree Formation Tarry’s algorithm is one of the oldest (1895) A possible route is: 0 1 2 5 3 1 4 6 2 6 4 1 3 5 2 1 0 Nodes and their parent pointers generate a spanning tree that may not be DFS

  41. Distributed MST • DefMST Fragment : In a weighted graph G = (V,E,w), a tree T in G is called anMST fragment of G, i there exists an MST of G such that T is asubgraph of that MST. • DefMWOE : An edge e is an outgoing edge of a MST fragment T, iff exactlyone of its endpoints belongs to T. The minimum weight outgoing • edge is denoted MWOE(T). • Lemma : Consider a MST fragment T of a graph G = (V, E,w). Let • e = MWOE(T). Then T U e is a MST fragment as well. • Proof : Let TM be an MST containing T. If TM contains T we are done. • Otherwise, let e’ be an edge that connects T to the rest of TM. • Clearly, e’ is an outgoing edge of T and w(e’)>=w(e).Adding e to TM, creates a graph C with a cycle through e and e’.Discarding e’ from C yields a new T’ M with w(T’ M) >= w(TM).

  42. Minimum Spanning Tree • Given a weighted graph G = (V, E), generate a spanning tree T = (V, E’) such that the sum of the weights of all the edges is minimum. • Applications • On Euclidean plane, approximate solutions to the traveling salesman problem, • Lease phone lines to connect the different offices with a minimum cost, • Visualizing multidimensional data (how entities are related to each other) • We are interested in distributed algorithms only The traveling salesman problem asks for the shortest route to visit a collection of cities and return to the starting point.

  43. Example

  44. Sequential algorithms for MST • Review (1) Prim’s algorithm and (2) Kruskal’s algorithm. • Theorem. If the weight of every edge is distinct, then the MST is unique.

  45. GHS is a distributed version of Prim’s algorithm. Bottom-up approach. MST is recursively constructed by fragments joined by an edge of least cost. Gallagher-Humblet-Spira (GHS) Algorithm 3 7 5 Fragment Fragment

  46. Challenges Challenge 1. How will the nodes in a given fragment identify the edge to be used to connect with a different fragment? A root node in each fragment is the coordinator

  47. Challenges • Challenge 2. How will a node in T1 determine if a given edge connects to a node of a different tree T2 or the same tree T1? Why will node 0 choose the edge e with weight 8, and not the edge with weight 4? • Nodes in a fragment acquire the same name before augmentation.

  48. Two main steps • Each fragment has a level. Initially each node is a fragment at level 0. • (MERGE) Two fragments at the same level L combine to form a fragment of level L+1 • (ABSORB) A fragment at level L is absorbed by another fragment at level L’ (L < L’)

  49. To test if an edge is outgoing, each node sends a test message through a candidate edge. The receiving node may send accept or reject. Rootbroadcastsinitiate in its own fragment, collects the report from other nodes about eligible edges using a convergecast, and determines theleast weight outgoing edge. Least weight outgoing edge test accept reject

More Related