Dynamic Programming

Dynamic Programming JinJu Lee & Beatrice Seifert CSE 5311 Fall 2005 Week 10 (Nov 1 & 3)

Contents • Max Flow Min Cut Theorem • Edmond-Karp-Algorithm • Bipartite Graph Matching Problem • Dynamic Programming Paradigm - Fibonacci Sequence - Longest Common Subsequence Problem - String Matching Algorithms - Substring Matching - Knuth-Morris-Pratt Algorithm

Max Flow Min Cut Theorem • The Ford-Fulkerson method repeatedly augments the flow along augmenting paths until a maximum flow has been found. • Max-flow min-cut theorem tells us that a flow is maximum if and only if its residual network contains no augmenting path.

Max Flow Min Cut Theorem • Cut of a flow network G = (V, E) is a partition of V into S and T = V-S such that s∈S and t∈T • If f is a flow, the net flow across the cut (S, T) is defined to be f(S, T) • Capacity of the cut is c(S, T) • Minimum Cut: a cut whose capacity is minimum over all cuts of the network

12/12 15/20 v1 v3 11/16 t s 10 1/4 4/9 7/7 4/4 8/13 v2 v4 11/14 S T Max Flow Min Cut Theorem • The net flow across this cut • f(v1, v3) + f(v2, v3) + f(v2, v4) • = 12 + (-4) + 11 = 19 • capacity • C(v1, v3) + c(v2, v4) • = 12 + 14 = 26 • Net flow across any cut is same, and it equals the value of the flow. • The value of a maximum flow is in fact equal to the capacity of a minimum cut.

Edmond-Karp algorithm • Improved version of Ford-Fulkerson algorithm • Use BFS to find the augmenting path instead of using DFS • The augmenting path is a shortest path from s to t in the residual network • Running Time of Edmonds-Karp algorithm : O(VE2)

Max Bipartite Matching Problem • Matching: Given an undirected graph G=(V, E), a subset of edges M⊆E such that for all vertices v∈V, at most one edge of M is incident on v. • Maximum Matching: a matching of maximum cardinality, that is, a matching M such that for any matching M’, we have |M| ≥|M’|

L R R L (b) A maximum matching with cardinality 3. (a) Matching with Cardinality 2 Max Bipartite Matching Problem Figure 26.7 A bipartite graph G=(V, E) with vertex partition V = L∪R

Finding a Max Bipartite Matching • Use Ford-Fulkerson method • Original problem: undirected bipartite graph • Solution - construct a flow network (add source s and destination t) - put arrows from L to R - assign unit capacity to each edge • Running Time: O(VE)

Finding a Max Bipartite Matching t s L R L R (b) The corresponding flow network with a maximum flow shown. (a) The bipartite graph from Figure 26.7

1 1 1 s t s t 1 1 1 1 1 1 1 1 1 1 s t s t 1 1 1 1 Finding a Max Bipartite Matching

What is dynamic programming? • It is a method to reduce the runtime of algorithms by : • Breaking the problem into smaller sub- • problems • Solving these sub-problems optimally (greedy algorithms) • Using these optimal solutions to the subproblems to construct an optimal solution for the original problem bse

Example: Fibonacci Sequence The calculation of the Fibonacci sequence is intuitively a recursive algorithm: If n = 0 f(n) = 1 else if n = 1 f(n) = 1 else f(n) = f(n-1) + f(n-2) The runtime is T(n) = O(2n)  very inefficient. bse

Now, if we use a dynamic programming algorithm it takes much less time to solve the problem: We build an array, with the first two cells containing a 1 each (Fib(0) and Fib(1)). To get the next cell (Fib(2)) we sum the content of the first two cells and store the result in the third cell. For the fourth cell we sum the content of the second and the third cell, and so on. This way every number is only calculated once. The runtime of this algorithm is linear: T(n) = O(n), and so are the space requirements, as they remain constant. bse

Longest Common Subsequence Problem LCSS What is the difference between a substring and a subsequence? A substring is a contiguous “mini string” within a string. A subsequence is a number of characters in the same order as within the original string, but it may be non-contiguous. String A: a b b a c d e (1-2-3-4-5-6-7) Substring B: a b b a (1-2-3-4) Subsequence C: a a c e (1-4-5-7) bse

LCSS Finding the longest common subsequence (lcs) is important for applications like spell checking or finding alignments in DNA strings in the field of bioinformatics (e.g. BLAST). How does it work? There are several methods, e.g. the “brute force policy” or the LCSS which is a dynamic programming method. bse

Brute Force Method Given two strings A and B find the set S of all possible subsequences of A, S = {A1, A2, …, Ak}. Then check whether each of those subsequences of A is contained in B, keeping track of the longest subsequence found. This algorithm uses T(n) = O(2k)  very inefficient, especially for longer strings like DNA strands. bse

LCSS The algorithm uses four steps: • Analyze lcs properties:Let A = <a1, …, am> and B = <b1, …, bn> be strings and let C = <c1, …, ck> be any lcs of A and B.- If am = bn, then ck = am = bn, and Ck-1 is an lcs of Am-1 and Bn-1.- If am ≠ bn, then ck ≠ am, and this implies that C is an lcs of Am-1 and B and also: then ck ≠ bn, and this implies that C is an lcs of A and Bn-1. bse

LCSS • Devise a recursive solutionx[i, j] = 0, if i = 0 or j = 0 x[i, j-1] + 1, if i, j > 0 and ai = bj max.(x[i, j-1], x[i-1, j]), if i, j > 0 and ai ≠ bj where x[i, j] is the length of the lcs in the strings Ai and Bj. If either of the strings has length 0 then there can be no lcs. The other parts of the formula recursively break the problem into smaller subproblems until the null-string is reached. bse

LCSS • Compute the lcsThere are several algorithms that do this, some use exponential time, others use polynomial time. We will show the algorithm from class. It is called “Generic LCS Delta Algorithm” and it uses T(n) = O(m*n) time and S(n) = O(min.(m, n)) space if only one array (table) is used and S(n) = O(m+n) otherwise. • The algorithm creates a table t with the lengths m and n of A and B respectively. There is also a table s created which is a copy of t and is used to store the current optimal solution. bse

LCSS String A of length m S t r i n g B o f l eng t h n Each cell shows x[i, j] bse

Substring Matching Let A be a string of length n and P be a pattern of length m, m<n. Then we want to know whether P is contained in A. We place P at the beginning of A and compare. If it is a match we are done. If it is no match we move P one place to the right, compare, and see whether it is a match. If it is a match we are done. If it is no match we move P one place to the right and compare. These steps are repeated, either until we find a match or until we cannot move P further to the right. If we want to find all matches we are not done after finding one match. We simply restart the algorithm until we can move P no further to the right. bse

Substring Matching The algorithm uses T(n) = O((n-m)*m) ≈ O(n*m) time. Example: 1 2 3 4 5 6 7 8 9 10 11 12 13 String A: a b c a b a a b c a b a c Pattern P: a b a a We have to move P to the right until it matches A  we move it 3 cells over. bse

Knuth-Morris-Pratt-Algorithm The Knuth-Morris-Pratt Algorithm (KMP) is an optimization of the substring matching algorithm. It lets us bypass previously matched characters. It runs in T(n) = O(m+n) ≈ O(n) time, i.e. it is linear! We have a string A of length n, a pattern of length m and a prefix array of the same length as the pattern, i.e. m. The prefix array tells us how far we have to move pattern P along string A to the right at each step. We have two pointers, one moves along A and the other along P. bse

Knuth-Morris-Pratt-Algorithm Example: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 String A: b a c b a b a b a b a c a c a Pattern P: a b a b a c a Prefix Array Pr: 0 0 1 2 3 0 1 How do we construct the prefix array Pr? We put P against itself and move it along to the right, comparing while we move. Our P consists of seven characters. There cannot be anything matched in the first spot, so we put 0 in spot 1 (default). We move P one cell to the right, compare, and see whether it is a match: a ≠ b, so we get 0 for spot 2. We move P again one to the right: aba now match so we put 1 in spot 3, 2 in spot 4 and 3 in spot 5, the rest of the characters do not match. We move P one to the right: already the first letter does not match, a ≠ b. We slide P one over: only one character matches (the first, a), so the 3 we have in slot 5 is bigger and we do not make a change. We move P one over: we have no match, a ≠ c and b ≠ a, so we put 0 in slot 6. We move P one over: a = a, so we put 1 in slot 7. bse

Knuth-Morris-Pratt-Algorithm So we have string A b a c b a b a b a b a c a c a and pattern P a b a b a c a and prefix array Pr 0 0 1 2 3 0 1. b a c b a b a b a b a c a c a a b a b a c a , a ≠ b, so we move P 1 over a b a b a c a , a = a, but b ≠ c, we have 1 match, so we check Pr at slot 1 which is 0, so we can skip no characters and move P 1 over a b a b a c a , a ≠ c so we move P 1 over a b a b a c a , a ≠ b so we move P 1 over a b a b a c a , the first 5 characters match but then c ≠ b, so we check Pr at slot 5 which is 3, so we can skip 3 characters and move P 5 – 3 = 2 over a b a b a c a , now we have a full match, so we are done bse

Dynamic Programming