CS305/503, Spring 2009 Graph Algorithms

CS305/503, Spring 2009Graph Algorithms Michael Barnathan

Here’s what we’ll be learning: • Data Structures: • Graphs (Adjacency Matrix Representation). • Theory: • Dijkstra’s algorithm. • Floyd’s algorithm. • Traveling salesman problem (TSP). • Dynamic programming. • And then we’re done. • These are all of the data structures taught in an algorithms course at Monmouth. • If you’ve learned them all, you will have an excellent foundation in this field, suitable for furthering your study in graduate school or industry.

Traditional Graph Representation Vertices G = 1 3 2 5 Edges 4 V = 1 2 3 4 5 E =

Adjacency Matrices • A graph G can also be represented as an adjacency matrix. • Let v be the number of vertices in G. • The adjacency matrix A is then a v x v binary matrix indicating the presence of edges between nodes: • Ai,j = 1 if an edge exists between vertices i and j. • Ai,j = 0 otherwise. • If the graph is undirected, A will be symmetric. • That is, an edge between vertices i and j means an edge also exists between vertices j and i. • This is not necessarily true if the graph is directed.

Adjacency Matrix Example G = 1 3 2 5 4 A =

Adjacency Matrix Tradeoffs • O(n2) space required to store the matrix, but each entry can be represented with one bit. • More efficient representation in very dense graphs, but fails overall. • More natural for use in certain algorithms. • O(n) time required to retrieve all edges of a vertex, even if the vertex has only one edge. • Because it also stores the 0s, scanning a vertex’s edges requires scanning a whole row. • The traditional (adjacency list) representation takes O(e) time, where e is the number of edges adjacent to the vertex you are examining. e is much smaller than n.

Adjacency Lists • The traditional method of representing edges is called an adjacency list. • Simple: Every vertex contains a linked list of edges it is adjacent to. • The edge typically stores the vertices on both ends as well, to allow the traversal across that edge to be constant time. • If edges did not keep track of their vertices, traversing one would require linearly scanning all vertices for adjacency.

Converting Vertex[] mat2list(boolean[][] adjmatrix) { Vertex[] ret = new Vertex[adjmatrix.length]; for (int vidx = 0; vidx < adjmatrix.length; vidx++) ret[vidx] = new Vertex(); //Initialize vertices. for (int vidx = 0; vidx < adjmatrix.length; vidx++) for (int eidx = 0; eidx < adjmatrix[vidx].length; eidx++) if (adjmatrix[vidx][eidx]) //If the matrix has a 1, add an edge to the list. ret[vidx].edges().add(new Edge(ret[vidx], ret[eidx])); } return ret; } boolean[][] list2mat(Vertex[] adjlist) { boolean[][] ret = new boolean[adjlist.length][adjlist.length]; //Default value is false. for (int vidx = 0; vidx < adjlist.length; vidx++) { for (Edge eadj : adjlist.edges()) ret[vidx][eadj.getOtherVertexIndex()] = true; //Edges in the list are true. } return ret; }

Weighted Adjacency Matrices • Weights can be added to adjacency matrices as well. • Rather than using 1s to represent edges, use the edge weights. • Missing edges are represented by infinity. • Infinity in Java is Double.POSITIVE_INFINITY.

Shortest Path Problem • You are working for a new software company called 10100, Inc. • They just received access to a road database for their new product, 10100 Maps. • You are asked to develop the algorithm that computes the fastest route from point A to point B. • For example, from Monmouth University to Carnegie Hall.

Example Graph Carnegie Hall 27 min. 24 min. 12 min. Jersey City 10 years. (“Practice, practice, practice”) Newark 44 min. 43 min. 54 min. Old Bridge 20 min. Freehold 34 min. 27 min. Monmouth University What is the quickest way to Carnegie Hall?

Dijkstra’s Algorithm • Named after Edsger Dijkstra, who discovered it in 1959. • Also called the shortest-path algorithm, which should tell you what it does. • Of course, it finds the shortest path from one node to another (or to all others) in a graph. • Key insight: if you have found the shortest path from Old Bridge to Carnegie and Freehold to Carnegie, you will not need to calculate the path from Freehold to Old Bridge. Going directly to Old Bridge is faster. • Caveat: this is not true if negative weights exist in the graph. In this case, maybe going from Freehold to Old Bridge saves you time and the link must still be checked! • Dijkstra’s algorithm only works when all weights are non-negative.

Dijkstra’s Algorithm Overview: • Declare an array, dist, of shortest path lengths to each vertex. Initialize the distance of the start vertex to 0 and every other vertex to infinity. • Create a priority queue and fill it with all nodes in the graph. • While the queue is not empty: • Remove the vertex u with the smallest distance from the start. • Compute the minimum distance md between u and each neighboring vertex v (scan u’s edges and choose the one with the smallest weight). • For each neighbor v, if dist[u] + md < dist[v], • Set dist[v] = dist[u] + md • Set v’s “predecessor vertex” to u. This is used to retrace the path. • (We have found a shorter path to v than our current best). • Trace the path back from the target to the source by traversing the predecessor node of each vertex from the target. Reverse it and you have the shortest path from source to target.

Dijkstra’s Algorithm – Starting State ∞ Carnegie Hall 27 min. 24 min. ∞ ∞ 12 min. Jersey City Newark 10 years. 44 min. 43 min. ∞ 54 min. Old Bridge ∞ 20 min. Freehold 34 min. 27 min. 0 Monmouth University

Dijkstra’s Algorithm – First Iteration 10 years Carnegie Hall 27 min. 24 min. ∞ ∞ 12 min. Jersey City Newark 10 years. 44 min. 43 min. 34 So far practicing is winning. Maybe your piano teacher was right… 54 min. Old Bridge 27 20 min. Freehold 34 min. 27 min. 0 Monmouth University

Dijkstra’s Algorithm – Second Iteration 10 years Carnegie Hall 27 min. 24 min. 78 77 12 min. Jersey City Newark 10 years. 44 min. 43 min. 34 54 min. 34 < 47, so Old Bridge keeps its current predecessor. Old Bridge 27 20 min. Freehold 77 < 81, so Newark’s predecessor is Old Bridge. 34 min. 27 min. 0 Monmouth University

Dijkstra’s Algorithm – Third Iteration 102 Carnegie Hall 27 min. 24 min. 78 77 12 min. Jersey City Newark 10 years. 44 min. 43 min. 34 54 min. 102 < 104 and 102 < 10 years, so Carnegie Hall changes its predecessor to Jersey City. Old Bridge 27 20 min. Freehold 34 min. 27 min. 0 Monmouth University

Dijkstra’s Algorithm – Results: 102 The shortest time to Carnegie Hall is thus 102 minutes. Starting at Carnegie Hall and traversing its predecessor list, we see that we passed through Jersey City and Old Bridge. Therefore, the optimal route is Monmouth -> Old Bridge -> Jersey City -> Carnegie Hall. Carnegie Hall 24 min. 78 Jersey City 44 min. 34 Old Bridge 34 min. 0 Monmouth University

Dijkstra’s Algorithm – Pseudocode: Vertex[] Dijkstra(Vertex[] graph, int sourceidx) { double[] dist = new double[graph.length]; Vertex[] predecessor = new Vertex[graph.length]; PriorityQueue<Vertex> vertq = new PriorityQueue<Vertex>(); for (int vidx = 0; vidx < graph.length; vidx++) { dist[vidx] = Double.POSITIVE_INFINITY; vertq.add(graph[vidx]); } dist[sourceidx] = 0; while (!vertq.empty()) { Vertex cur = vertq.pop(); for (Edge adjedge : cur.edges()) { Vertex other = adjedge.getOtherVertex(); if (dist[cur] + adjedge.getWeight() < dist[other]) { dist[other] = dist[cur] + adjedge.getWeight(); predecessor[other] = cur; } } } return predecessor; //All shortest paths from the source node are contained here. }

Dijkstra’s Algorithm – Analysis: • What is the time complexity of this algorithm? • Assuming a linear search is performed on the priority queue when removing the element? • Assuming the traditional heap implementation of a priority queue (which is tricky in this case because the distance changes throughout the algorithm)? • Hint: it will depend on both V and E. • How much space is being used?

Dijkstra’s Algorithm – Discussion: • This algorithm always chooses the path of shortest distance to record at each step. • What did we call those algorithms again? • When it finishes, the recorded path will be the absolute shortest from the source. • Dijkstra’s algorithm will fail if given a graph with negative weights. Use the Bellman-Ford algorithm (which we won’t discuss) for this.

Floyd’s Algorithm • Also called the Floyd-Warshall algorithm. • This algorithm reports shortest paths between ALL pairs of nodes in the graph. • This algorithm also does not work when negative weights exist in the graph. • You could also run Dijkstra’s algorithm for each vertex in the graph, but you would be repeating work, and it would cost you: O(v3 * e), to be precise. • In the worst case, e = v2, so this algorithm could cost O(v5). • Floyd’s algorithm improves this to O(v3). • It uses a technique called dynamic programming to do this.

Floyd’s Algorithm double[][] floyd(double[][] graph) { //Weighted Adjacency matrix representation. double[][] pathlen = (double[][]) graph.clone(); for (int start1 = 0; start1 < graph.length; start1++) for (int start2 = 0; start2 < graph.length; start2++) for (int end = 0; end < graph.length; end++) pathlen[start2][end] = Math.min(pathlen[start2][end], pathlen[start2][start1] + pathlen[start1][end]); return pathlen; }

Recall: Divide and Conquer • Divide and Conquer is an algorithm design paradigm that splits large problems up into smaller instances of the same problem, solves the smaller problems, then merges them to get a solution to the original problem. • When a table of solutions to subproblems is kept to avoid redoing work, this is called memoization.

Dynamic Programming • Floyd’s algorithm is a dynamic programming algorithm. • The idea behind dynamic programming is similar to memoization. • Whereas memoization begins with large problems and breaks them down, dynamic programming builds large solutions from smaller problems. • Dynamic programming is used in problems with overlapping substructure: when a problem can be split “horizontally” into overlapping subproblems, which can be merged back later: • For example, computing path[1][5] would involve computing path[1][2] + path[2][5], path[1][3] + path[3][5], and path[1][4] + path[4][5]. • The problem space can be partitioned into subsets of itself and those subsets can be merged together to solve the full problem. • A table is still required to store the solutions to the subproblems. • While memoization has a naturally recursive structure, dynamic programming algorithms often involve computations within a loop.

The Traveling Salesman Problem • Let’s say you’re in charge of planning FedEx’s delivery route. • You have packages to deliver in New York, Denver, Chicago, and Boston. • Gas is expensive for the company, so you’d like to find the route with the shortest distance required to deliver all of the packages.

Graph Representation 982 mi. Boston Chicago 1001 mi. 791 mi. 215 mi. 1777 mi. New York Denver Starting from New York, which route minimizes the total distance?

The Naïve Algorithm • Compute all permutations of edges and sum the path lengths. Select the smallest. • This is equivalent to “topological sorting” the graph, and takes O(v!) time. • Dynamic programming can get this down to O(v22v), but it’s still exponential. • The million dollar question: Is there any way to solve this problem in less than exponential time? • Literally. Find one or prove one can’t exist and you’ll win $1 million.

NP Completeness • TSP is an example of an NP Complete problem. • These are problems whose solutions can be verified in polynomial time, but (probably) can’t be computed in polynomial time. • All NP complete problems can be reduced to each other; they form a complexity class. • The open (million dollar) question is whether the complexity classes P and NP are equal. • Finding a polynomial-time algorithm for even one of these problems, or proving that no such algorithm exists, is sufficient to prove P = NP or P != NP.

Approximations • So is UPS out of luck? • Not entirely… it turns out that there are many approximation algorithms or heuristics for NP-complete problems that will run in polynomial time. • Some of these give very good estimates. Certainly good enough when the question is one of driving distance. • Continuing this discussion is likely outside of this course’s scope.

Other Graph Problems • There are many open problems in graph theory. • Vertex covers. • Cliques. • Flow. • Graph coloring. • Knight’s tours. • With the rise of social networks, this is becoming a more and more relevant field. --xkcd

Not the Shortest Lecture • We rounded out the topics usually taught in an algorithms course with Dijkstra’s and Floyd’s shortest-path algorithms and briefly discussed the notion of NP completeness in the Traveling Salesman Problem. • The lesson: • Slight variations on problems may not seem to make them harder, but may in fact make them intractable. It isn’t always apparent.

CS305/503, Spring 2009 Graph Algorithms

CS305/503, Spring 2009 Graph Algorithms

Presentation Transcript

CS305, HCI in Software Development Usability Goals, Principles, Guidelines