1 / 36

Graphs and Finding your way in the wilderness

Graphs and Finding your way in the wilderness. Chapter 14 in DS&PS Chapter 9 in DS&AA. General Problems. What is the shortest path from A to B? What is the shortest path from A to all nodes? What is the shortest/cheapest path between any two nodes?.

waite
Download Presentation

Graphs and Finding your way in the wilderness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graphs andFinding your way in the wilderness Chapter 14 in DS&PS Chapter 9 in DS&AA

  2. General Problems • What is the shortest path from A to B? • What is the shortest path from A to all nodes? • What is the shortest/cheapest path between any two nodes?. • Search for Goal node, i.e. a node with specific properties, like a win in chess. • What is shortest tour? (visit all vertices) • no known polynomial algorithm in number of edges • What is longest path from A to B

  3. Some Applications • Route Finding • metacrawler, on net, claims to find shortest path between two points • Game-Playing • great increase in chess/checkers end-game play occurred when recognized as graph search, not tree search • Critical Path Analysis • multiperson/task job schedule analysis • answers what are the key tasks that can’t slip • Travel arrangement • cheapest cost to meet constraints

  4. Definitions • Graph: set of edges E and vertices V. • Edge is a pair of vertices (v,w) • edge may be directed or undirected • edge may have a cost • v and w are said to be adjacent • Digraph or directed graph: directed edges • Path: sequence of vertices v1,…vn where each <vi,vi+1> is an edge. • Cycle: path where v1=vn • Tour: cycle that contains every vertex • DAG: directed acyclic graph • graph with no cycles • Tree algorithms usually work with DAGs just fine

  5. Adjacency Matrix Representation • Matrix A = new Boolean(|V|,|V|). • O(|V|^2) memory costs: acceptable only if dense • If <vi,vj> is an edge, set A[i][j] = true, else false • Special matrix multiple operator: • row[i] @ col[j] = row[i][1]&col[1][j] or row[i][2]&col[2][j]….. • In A@A, if entry [i][j] is true, what does that mean? • There is some k so that vi->vk and vk->vj or a length 2 path from vi to vk. • Similarly, A^k indicates where any two vertices are connected by a length k path. • Cost: O(k*n^3).

  6. Matrix Review • If A has n rows and k columns and • B has k rows and m columns then A*B = C • Where C has n rows and m columns and (standard) • C[i][j] = A[i][1]*B[1][j]+….+A[i][k]*B[k][j] • or i-j entry of C is dot product of row i of A and column J of B. Totaltime cost is O(n*k*m). • Example: matrix of size 3-3 represents a linear transformation from R^3 into R^3. • Points in R^3 represented as a 3 by 1 vector (column) • Essentially matrices stretch/shrink or rotate points. • Determinant defines amount of stretch/shrink. • Theorem: Locally every differentiable function can be approximated by a matrix (linear transformation).

  7. Cost Matrix Representation • Now A[i][j] = cost of edge from vi to vj. • If no edge, either set cost to Infinity or add Boolean attribute to indicate no edge. • New multiplication operation • row[i]@col[j] = min { row[i][1]+col[1][j], row[i][2]+col[2][j],… row[i][n] +col[n][j] } • Now A^2 contains minimum cost path of length 2 between any 2 vertices. • A^n has complexity O(n^4). Not good. • If add A[i][i]= 0, then A^k records minimum cost path of length = k. (how to change to allow all paths <= k)

  8. Adjacency List Representation • Here each vertex is the head of linked list which stores all the adjacent vertices. The cost to a node, if appropriate is also stored. • The linked lists are usually stored in an array, since you probably know how many vertices there are. • Memory cost = O(|E|) so if |E| << |V|^2, use list representation. • Graph is sparse if |E| is O(|V|). • To simplify the discussion, we will assume that each vertices has a method *sons* which returns all adjacent vertices. • Now, will redo same problems with list representation.

  9. General Graph Search AlgorithmLooking for a node with a property Set Store equal to the some node while ( Store is non-empty) do choose a node n in Store if n is solution, stop • Decisions: else add SOME sons of n to store • What should store be? • How do we choose a node? • what does add mean? • How do pick which sons to store. • Cycles are a problem

  10. Problem: Is Graph connected? (matrix rep) Note: n by n boolean matrix where n is number of vertices. Set A[i][i] to true Set A[i][j] to true if there is an edge between i and j. Let B= A^2, using boolean arithmetic Note B[i][j] is true iff there is a k such that B[i][k] is true and B[k][j] is true, i.e if there is a 2-path from i to j. A^k represents whether a k-path exists between any vertices. Let C = boolean sum of A^i where i= 1…n-1. (why?) Graph connected if C is all ones. Time complexity: O(N^4)! How about directed graphs? Basically the same algorithm. For directed graphs, strongly connected means directed path between any two vertices.

  11. Is Undirected Graph G Connected? (adjacency list representation) • Suppose G is an undirected graph with N vertices. Let S be any node Do a (depth/breadth) first search of G, counting the number of nodes. Be careful not to double count. If number of nodes does not equal N, disconnected. • Searching a Graph is like searching a tree, except that nodes may be revisited. • Need to keep track of revisits, else infinite loop.

  12. Depth First Search Pseudo-Code • Store = Stack • Choose = pop • Initial node: any node • Add = push only new sons (unvisited ones) • keep a boolean field visited, initialized to false. • When a node is “popped”, mark it as visitied. • Graph connected if all nodes visited. • Properties • Memory cost: number of nodes • Guarantee to find a solution, if one exists (not shortest solution) How could we guarantee that? • Time: number of nodes (exponential for k-ary trees)

  13. Breadth First Search • G is a undirected Graph • As before each node has a boolean visited field. • Initial node is arbitrary • Store = Queue • Choose = dequeue and mark as visited • Add = enqueue only those sons that have not been visited • Properties: • Time: Number of nodes to solution • Space: Number of nodes • Guaranteed to find shortest solution

  14. Is Directed Graph Acyclic? (array represntation) • Let A[i][j] be true if there is a directed edge from i to j. • Similar to previous case, if B= A^2 with boolean multiplication, then B[i][j] is true iff there is a directed 2-path from i to j. • Algorithm: For i = 1 to n-1 (why?) Compute A^i. If some diagonal element is true, exit with true end for Exit with false.

  15. Is Directed Graph Acyclic? (adjacency list rep) • With care, breadth first search works. • Define the indegree of a node v as the number of edges of the form (u,v), with u arbitrary. • Define the outdegree of a node v as the number of edges of the form (v, u) with u arbitrary. • A node with indegree 0 is like the root of a tree. • A node with outdegree 0 is a terminal node.

  16. Breadth First Search Pseudo-code • Algorithm Idea (has numerous variations/implementations) • Store = Queue Compute indegree of all nodes Enqueue all nodes of indegree 0 While Queue is not empty Dequeue node n and lower indegrees of nodes of form (n,v) Enqueue any node whose indegree is 0. If any node still has positive indegree, then cyclic. Why does algorithm terminate? • Properties: • Time & Space: Number of nodes • find node closest to “roots”.

  17. Best-First Search • Goal: find least cost solution • Here edges have a cost (positive) • Store = priority queue • Add = enqueue(), which puts in right order • Choose = dequeue(), chooses element of least cost • Properties: • Find cheapest solution • Time and Memory: exponential in… • depth of tree.

  18. Best First Pseudo-Code Set distance from S to S to 0 Priority Queue PQ <- S while (PQ is not empty) vertex <- PQ.deque() sons <- vertex.sons() for each son in sons PQ.enqueue(son, cost to son)

  19. Topological Sort • Given: a directed acyclic graph • Produce: a linear ordering of the vertices such that if a path exist from v1 to v2, then v1 is before v2. • If v1 is before v2, is there a path from v1 to v2? • NO • Note: there may be multiple correct topological sorts • Algorithm Idea: • any vertex with indegree 0 can be first • Output and delete that vertex • update indegree’s of its sons • Repeat until empty • So we need to compute and keep track of indegree’s

  20. Algorithm Implementation • HashTable of (vertex, indegree, sons) • Queue of vertices • Step 1: read each edge (v,w) and add 1 to indegree of w • linear • Step 2: Add all vertices with indegree 0 to queue Q. • Step 3: Process Q by: • dequeue vertex • update indegrees of its sons (constant by hashing) • enqueue any son whose indegree become 0. • Time complexity: linear • Space: linear • Proof: Does everything get enqueued?

  21. UnWeighted Single-Source Shortest path algorihtm • Input: Directed graph and start node S • Output: the minimum cost, in terms of number of edges traversed, from start node to all other nodes. • Idea: do a level order search (breadth-first search) • We’ll use a hashtable to mark elements as “seen”, • i.e. we’ll track vertices that we’ve visited • use hashtable to hold this information by “marking” vertices that have been visited • We’ll use a queue to store the vertices to be “opened” • to open a node means to consider its sons • Each vertex will have a field for distance to start node.

  22. Shortest Path Algorithm Set distance from S to S to 0 queue <- S while (queue is not empty) vertex <- queue.dequeue() mark vertex as visited (enter in hashtable) record cost to vertex sons <- vertex.sons() newSons <- sons that are unmarked fill in distance measure to newSons queue.enqueue(newSons) Essentially, breadth-first search

  23. Discussion • Will this terminate? • Will we ever revisit a node? • Computational cost? • O(|E|) • What are the memory requirements? • Suppose we don’t count graph (virtual graphs) • Can we bound queue? • Only O(|E|) and if m-ary tree, this is exponential. • Did we need the hashtable? • This avoids a linear search of the constructed graph • remove a factor of O(|G|).

  24. Positive-Weighted Single-source Cheapest Path • Suppose we have positive costs associated with every edge in a directed graph. • Problem: Find the shortest path(total cost) from given vertex S to every vertex. • Solution: Dijstra’s algorithm • BFS idea still works, with slight modifications • Replace Queue by Priority queue. • As before, replace newSons by betterSons. • As before, replace add entry to update entry (which may be add) • Note: may reopen an old son( if return with better path) • Dense graphs: O(|V|^2), sparse graphs: O(|E|log|V|)

  25. Weighted Shortest Path Pseudo-Code Set distance from S to S to 0 (on node) Priority Queue PQ <- S while (PQ is not empty) vertex <- PQ.dequeue() … remove min mark vertex as visited (enter in hashtable) record cost to vertex sons <- vertex.sons() goodSons <- new sons OR old sons with better costs estimates queue.enqueue(goodSons)… enqueue puts in proper order.

  26. Graphs with negative edge costs • Dijsktra doesn’t work (since we may have cycles which lower the cost) • Input: Directed graph with arbitrary edge costs and vertex v. • Output: Minimum cost from S to every vertex OR graph has a negative cost cycle. • Note: If no negative cost cycles, then a vertex can be visited (expanded) at most |V| times. • Algorithm: Add counter to each vertex so each time it is visited with lower cost, counter goes up. If counter exceeds |V|, then graph has negative cost cycle and we exit. Otherwise queue will be empty.

  27. Weighted Single-Source shortest-path problems for Acyclic graphs • Easy since no cycles • Edge costs may be positive or negative • Best-first search works • Node may be reentrant • Reentrant node require may required updating cost. • Or apply topological sorting algorithm. (text) 2 4 7 3 6 1

  28. Algorithm Display • Idea: • Iterative use of breadth first search • addition of edges to effect other choices • See Diagrams provided • Analysis: (requires augmented path be cheapest) • Runs in linear time

  29. Minimum Spanning Tree • Given: an undirected connected graph with edge costs • Output: a subtree of graph such that • contains all vertices • sum of costs of edges is minimum • If costs not given, assume 1. What then? • Note all spanning trees have same number of edges • Application: • Is undirected Graph with n vertices connected? • IFF minimal spanning tree has n-1 edges.

  30. Prim’s Algorithm • Let G be given as (V,E) where V has n vertices • Let T = empty • Algorithm Idea: grow cheapest tree • Choose a random v to start and add to T • Repeat (until T has n vertices) • select edge of minimum length that does not form a cycle and that attaches to current tree (how to check?) • add edge to T • The proof is more difficult than the code. • Complexity depends on G and code • O(V^2) for dense graphs • O(E*log(V)) for sparse graphs (use binary heap)

  31. Kruskal’s Algorithm • Given graph G = (V,E) • Sort edges on the basis of cost. • Add least cost edge to Forest, as long as no cycle is formed. • Cost of cycle checking is? • If implement as adjacency list, O(E^2) • If implement as hash table O(1) • Proof more difficult. • Time complexity: O(E log E)

  32. Finding the least cost between pairs of points • Idea: Dynamic programming • Let c[i][j] be the edge cost between vi and vj. • Define C[i][j] as minimum cost for going from vi to vj. • Finding the subproblems • Suppose P is the path from vi to vj which realizes the minimum cost and vk is an intermediary node. • Then the subpaths from i to k and from k to j must be optimal, otherwise P would not be optimal. • Now define D[i][k][j] as the minimum cost for going from vi to vj using any of v1,v2..,vk as an intermediary. • Define D[i][j] as the minimum cost for going from i to j. • D[i][j] = min over k of D[i][k][j] and c[i][j].

  33. All-Pairs (Floyd’s)Pseudo-Code • Initialization: • D[i][0][j] =cost(i,j) for all vertices i, j … O(|V|^2) • D[i][k+1][j] = • min(D[i][k][j], D[i][k][k+1]+D[k+1][k][j]) • This last statement is true since any path from the shortest path from vi to vj using {v1,…vk+1} either doesn’t use vk+1, or the path divides into a path from vi to vk+1 and one from vk+1 to vj. • The cost of this is O(|V|^3) - i.e. single loop over all vertices with |V|^2 per loop.

  34. NP vs P • Multiple ways to define • Define new computational model (Imaginary) • add to programming language • choose S1, S2,….Sn; where Si are statements • Semantics: algorithm always chooses best Si to execute. • This is the Non-Deterministic model • If problem can be solve in polynomial time with non-deterministic it is in the class NP. • If problem can be solved in polynomial time on standard computer (deterministic) then in class P. • Unsolved (and possibly unsolvable) does NP = P?

  35. NP-Completeness • A problem is in NP or NP-hard if it can be solved in polynomial time on a non-deterministic machine. • A problem p* is NP complete if any problem in NP can be polynomial reduced to p*. • A problem P1 can be polynomial reduced to P2 if P1 can be solved in polynomially time assuming that P2 can be solved in polynomially time. • Alternatively, if P1 can be transformed into P2 and solutions of P2 mapped back to P1 and all the transformations take polynomial time. • This is a way of forming a taxonomy of difficulty of various problems

  36. NP-Complete problems • Boolean Satisfiability • Traveling Salesmen • Bin Packing: given packages of size a[1]…a[n] and bins of size k, what is the fewest numbers of bins needed to store all the packages. • Scheduling: Given tasks whose time take t[1]…t[n] and k processors, what is minimum completion time? • Graph: Given a graph find the clique of maximum size. • A clique is a completely connected subgraph. • Subset-sum: Given a finite set S of n numbers and a target number t, does some subset of S sum to t. • Vertex Cover: A vertex cover is a subset of vertices which hits every edge. The problem is to find a cover with the fewest number of vertices.

More Related