Algorithm Design and Analysis

LECTURE 4 • Graphs • Definitions • Traversals Algorithm Design and Analysis CSE 565 Adam Smith A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

Exercise • How can you simulate an array with two unbounded stacks and a small amount of memory? • (Hint: think of a tape machine with two reels) • What is the cost per operation? • What if you only have one stack and constant memory? Can you still simulate arbitrary access to an array? • (Hint: think about pushdown automata.) • No. For example, you can’t check three unary numbers for equality. A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

Graphs A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

Graphs (KT Chapter 3) Definition. A directed graph (digraph)G = (V, E) is an ordered pair consisting of • a set V of vertices (synonym: nodes), • a set EÍVV of edges • An edge e=(u,v) goes “from u to v” (may or may not allow u=v) • In an undirected graphG = (V, E),the edge set E consists of unordered pairs of vertices • Sometimes write e={u,v} • How many edges can a graph have? • In either case, |E| = O(V 2). A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

Graphs are everywhere A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

Paths and Connectivity • Path = sequence of consecutive edges in E • (u,w1), (w1,w2), (w2,w3), …, (wk-1, v) • Write u↭vor u↝v • (Note: in a directed graph, direction matters) • Undirected graph G is connected if for every two vertices u,v, there is a path from u to vin G A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

Cycles • Def. A cycle is a path v1, v2, …, vk-1, vk in which v1 = vk, k > 2, and the first k-1 nodes are all distinct. cycle C = 1-2-4-5-3-1

Exercises • Suppose an undirected graph G is connected. • True or false? G has at least n-1 edges. • Suppose that an undirected graph G has exactly n-1 edges (and no self-loops) • True or false? G is connected. • What if G has n-1 edges and no cycles? A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

Trees • Def.An undirected graph is a tree if it is connected and does not contain a cycle. • Theorem. Let G be an undirected graph on n nodes. Any two of the following statements imply the third. • G is connected. • G does not contain a cycle. • G has n-1 edges.

Rooted Trees • Rooted tree:Given a tree T, choose a root node r and orient each edge away from r. • Models hierarchical structure. root r parent of v v child of v a tree the same tree, rooted at 1

Phylogeny Trees • Phylogeny trees. Describe evolutionary history of species.

Parse Trees • Internal representation used by compiler, e.g.: if (A[x]==2) then (322 + (a*64 +12)/8) else fibonacci(n) if-then-else == fn-call + / fibonacci n array ref 2 power + 8 A x 32 2 * 12 a 64 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

Paths and Connectivity • Directed graph? • Strongly connected if for every pair, u↝vand v↝u A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

Exploring a graph Classic problem: Given vertices s,t ∈ V, is there a path from s to t? Idea: explore all vertices reachable from s Two basic techniques: • Breadth-first search (BFS) • Explore children in order of distance to start node • Depth-first search (DFS) • Recursively explore vertex’s children before exploring siblings How to convert these descriptions to precise algorithms? A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

L1 L2 s L n-1 Breadth First Search • BFS intuition. Explore outward from s in all possible directions, adding nodes one "layer" at a time. • BFS algorithm. • L0 = { s }. • L1 = all neighbors of L0. • L2 = all nodes that do not belong to L0 or L1, and that have an edge to a node in L1. • Li+1 = all nodes that do not belong to an earlier layer, and that have an edge to a node in Li. • Theorem. For each i, Li consists of all nodes at distance exactly ifrom s. There is a path from s to tifft appears in some layer.

Breadth First Search L0 L1 L2 L3

Breadth First Search • Distance(u,v): number of edges on shortest path from u to v • Properties. Let T be a BFS tree of G = (V, E). • Nodes in layer i have distance i from root s • Let (x, y) be an edge of G. Then the level of x and y differ by at most 1. L0 L1 L2 L3

Following material not covered in lecture but important to review A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

BFS App: Connected Component • Connected component. Find all nodes reachable from s. • Connected component containing node 1 = { 1, 2, 3, 4, 5, 6, 7, 8 }.

Flood Fill • Flood fill. Given lime green pixel in an image, change color of entire blob of neighboring lime pixels to blue. • Node: pixel. • Edge: two neighboring lime pixels. • Blob: connected component of lime pixels. recolor lime green blob to blue

Connected Component • Connected component. Find all nodes reachable from s. • Theorem. Upon termination, R is the connected component containing s. • BFS = explore in order of distance from s. • DFS = explore in a different way. R s u v it's safe to add v

BFS example in a directed graph A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

BFS also works in directed graphs A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

Implementing Traversals Generic traversal algorithm • R = {s} • While there is an edge (u,v) where u ∈ R and v ∉ R, • Add v to R To implement this, need to choose… • Graph representation • Data structures to track… • Vertices already explored • Edge to be followed next These choices affect the order of traversal A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

1 if (i, j) ∈ E, 0 if (i, j) E. A[i, j] = A 1 2 3 4 1 0 1 1 0 2 1 • Storage: ϴ(V 2) • Good for dense graphs. • Lookup: O(1) time • List all neighbors: O(|V|) 2 0 0 1 0 3 0 0 0 0 3 4 4 0 0 1 0 Adjacency-matrix representation The adjacency matrix of a graph G = (V, E), where V = {1, 2, …, n}, is the matrix A[1 . . n, 1 . . n] given by A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

2 1 3 4 Adjacency list representation • An adjacency list of a vertex v ∈ V is the list Adj[v] of vertices adjacent tov. Typical notation:n = |V| = # verticesm = |E| = # edges Adj[1] = {2, 3} Adj[2] = {3} Adj[3] = {} Adj[4] = {3} For undirected graphs,|Adj[v] | = degree(v). For digraphs,|Adj[v] | = out-degree(v). How many entries in lists? 2|E| Total ϴ(V + E) storage — good for sparse graphs. A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

BFS with adjacency lists • d[1 .. n]: array of integers • initialized to infinity • use to track distance from root (infinity = vertex not yet explored) • Queue Q • initialized to empty • Tree T • initialized to empty A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

BFS pseudocode BFS(s): • Set d[s]=0 • Add s to Q • While (Q not empty) • Dequeue (u) • For each edge (u,v) adjacent to u • If d[v] == ∞ then • Set d[v] =d[u]+1 • Add edge (u,v) to tree T • Enqueuev onto Q A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

Theorem: BFS takes O(m+n) time BFS(s): • Set Discovered[s]=1 • Add s to Q • While (Q not empty) • Dequeue (u) • For each edge (u,v) adjacent to u • If Discovered[v]= false then • Set Discovered[v] =true • Add edge (u,v) to tree T • Add v to Q O(1) time, run once overall. O(1) time, run once per vertex O(1) time per execution, run at most twice per edge Total: O(m+n) time (linear in input size) A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

Notes • If s is the roof of BFS tree • For every vertex u, • path in BFS tree from s to u is a shortest path in G • depth in BFS tree = distance from u to s • Proof of BFS correctness: see KT, Chapter 3. A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

BFS Review • Recall: Digraph G is strongly connected if for every pair of vertices, s↝tand s↝t • Question: Give an algorithm for determining if a graph is connected. What is the running time? A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

Strong Connectivity: Algorithm • Lemma: G is strongly connected if and only if for any node s, every other node t has paths to and from s. • Theorem.Can determine if G is strongly connected in O(m + n) time. • Pf. • Pick any node s. • Run BFS from s in G. • Run BFS from s in Grev. • Return true iffall nodes reached in both BFS executions. • Correctness follows immediately from the previous lemma. reverse orientation of every edge in G not strongly connected strongly connected

Directed Acyclic Graphs • Def. An DAG is a directed graph that contains no directed cycles. • Ex. Precedence constraints: edge (vi, vj) means vi must precede vj. • Def. A topological order of a directed graph G = (V, E) is an ordering of its nodes as v1, v2, …, vn so that for every edge (vi, vj) we have i < j. v2 v3 v6 v5 v4 v1 v2 v3 v4 v5 v6 v7 v7 v1 a DAG a topological ordering

Precedence Constraints • Precedence constraints. Edge (vi, vj) means task vi must occur before vj. • Applications. • Course prerequisite graph: course vi must be taken before vj. • Compilation: module vi must be compiled before vj. Pipeline of computing jobs: output of job vi needed to determine input of job vj.

Directed Acyclic Graphs • Lemma. If G has a topological order, then G is a DAG. • Pf. (by contradiction) • Suppose that G has a topological order v1, …, vn and that G also has a directed cycle C. Let's see what happens. • Let vi be the lowest-indexed node in C, and let vj be the node just before vi; thus (vj, vi) is an edge. • By our choice of i, we have i < j. • On the other hand, since (vj, vi) is an edge and v1, …, vn is a topological order, we must have j < i, a contradiction. ▪ the directed cycle C v1 vi vj vn the supposed topological order: v1, …, vn

Directed Acyclic Graphs • Lemma. If G has a topological order, then G is a DAG. • Q. Does every DAG have a topological ordering? • Q. If so, how do we compute one?

Directed Acyclic Graphs • Lemma. If G is a DAG, then G has a node with no incoming edges. • Pf. (by contradiction) • Suppose that G is a DAG and every node has at least one incoming edge. Let's see what happens. • Pick any node v, and begin following edges backward from v. Since v has at least one incoming edge (u, v) we can walk backward to u. • Then, since u has at least one incoming edge (x, u), we can walk backward to x. • Repeat until we visit a node, say w, twice. • Let C denote the sequence of nodes encountered between successive visits to w. C is a cycle. ▪ w x u v

Directed Acyclic Graphs • Lemma. If G is a DAG, then G has a topological ordering. • Pf. (by induction on n) • Base case: true if n = 1. • Given DAG on n > 1 nodes, find a node v with no incoming edges. • G - { v } is a DAG, since deleting v cannot create cycles. • By inductive hypothesis, G - { v } has a topological ordering. • Place v first in topological ordering; then append nodes of G - { v } • in topological order. This is valid since v has no incoming edges. ▪ DAG v

Topological Sorting Algorithm: Running Time • Theorem. Algorithm finds a topological order in O(m + n) time. • Proof. • Maintain the following information: • count[w] = remaining number of incoming edges • S = set of remaining nodes with no incoming edges • Initialization: O(m + n) via single scan through graph. • Update: to delete v • remove v from S • decrement count[w] for all edges from v to w, and add w to S if ccount[w] hits 0 • this is O(1) per edge ▪ • Alternative algorithm: Modification of DFS (exercise?)

Algorithm Design and Analysis