260 likes | 286 Views
Discover the powerful technique of Dynamic Programming for solving optimization problems efficiently, such as finding the Longest Common Subsequence (LCS) in sequences. Learn how to formulate and implement DP solutions step by step.
E N D
Dynamic Programming Longest Common Subsequence
Dynamic Programming(DP) • A powerful design technique for optimization problems • Related to divide and conquer • However, due to the nature of DP problems, standard divide-and-conquer solution are not efficient
Dynamic Programming(DP) • Main question: How to set up the subproblem structure? • For DP to be applicable to an optimization problem • Optimal substructure: for the global problem to be solved optimally, each subproblem should be solved optimally • Polynomially many subproblems • Overlapping subproblems
Longest Common Subsequence (LCS) • Application: searching for a substring or pattern in a large piece of text • Not necessarily exact text but something similar • Method for measuring degree of similarity: LCS
Longest Common Subsequence (LCS) • Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A
Longest Common Subsequence (LCS) • Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A LCS(x,y) = BCBA
LCS • Not always unique X : A B C Y: B A C
LCS • Not always unique X : A B C Y: B A C
Brute Force Solution? • Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . .n]
Brute Force Solution? • Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . .n] Analysis • Checking = O(n) time per subsequence. • subsequences of x(each bit-vector of length m determines a distinct subsequence of x). Thus, Exponential time.
DP Formulation for LCS • Subproblems? • Consider all pairs of prefixes • A prefix of a sequence is just an initial string of characters • Let denote the prefix of X with i characters • Let denote the empty sequence x: A B C B D A B
DP Formulation • Compute the LCS for every possible pair of prefixes • C[i,j]: length of the LCS of and • Then, C[m,n]: length of LCS of X and Y x: A B C B D A B y: B D C A B A
Recursive formulation • C[i,0] = C[0,j] = 0 // base case • How can I compute C[i,j] using solutions to subproblems?
Recursive formulation Two cases • Last characters match: if X[i] = Y[j] LCS must end with this same character x: A B C B D A B y: B D C A B A i X A j Y A
Recursive formulation Two cases • Last characters match: if X[i] = Y[j] LCS must end with this same character C[i,j] = C[i-1, j-1] + 1 LCS ( ) = LCS( ) + X[i] i X A j Y A
i i i X X X B B B j j j Y Y Y A A A Recursive formulation (2) Last characters DO NOT match: LCS( ) //skip x[i] LCS( ) //skip Y[j] C[i,j] = max (C[i, j-1], C[i-1,j])
A recursive algorithm LCS(x, y, i, j) { if x[i] = y[ j ] then c[i, j ] ←LCS(x, y, i–1, j–1) + 1 else c[i, j ] ←max{LCS(x, y, i–1, j), LCS(x, y, i, j–1) } • Worst-case:x[i] ≠y[ j ], in which case the algorithm evaluates two subproblems, each with only one parameter decremented.
3,4 2,4 3,3 1,4 2,3 2,3 3,2 Recursion tree • Height = O(m+n) • Running time = potentially exponential!
BUT, we keep recomputing the same subproblems 3,4 2,4 3,3 1,4 2,3 2,3 3,2
Overlapping subproblems • A recursive solution contains a “small” number of distinct subproblems repeated many times • The number of distinct LCS subproblems for two strings of lengths m andnis onlymn
Store the solutions of subproblems • After computing a solution to a subproblem, store it in a table. • C[0..m,0..n] that stores lengths of LCS • Keep also a helper array B[0..m,0..n] to store some pointers to extract the LCS later
LCS(x[1..m],y[1..n]) { int C[0..m,0..n]; int B[0..m,0..n] for i=0 to m C[i,0] = 0; B[i,0] = UP for j=0 to n C[0,j] = 0; B[0,j] = LEFT for i=1 to m for j=1 to n if x[i] == y[j] C[i,j]=c[i-1, j-1] +1; B[i,j] = DIAG; else if (C[i-1,j] >= C[i, j-1]) C[i,j]=C[i-1, j]; B[i,j] = UP; else C[i,j]=C[i, j-1]; B[i,j] = LEFT; return C[m,n]; } O(mn)
B D C B 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 1 2 2 0 1 2 2 2 0 1 2 2 3 B A C D B Compute tables bottom-up Y Y B D C B 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 1 2 2 0 1 2 2 2 0 1 2 2 3 B A X X C D B Length Table: C Tables C and B
B D C B 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 1 2 2 0 1 2 2 2 0 1 2 2 3 B A C D B Compute tables bottom-up Y Y B D C B 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 1 2 2 0 1 2 2 2 0 1 2 2 3 B A X X C D B Length Table: C Tables C and B Start from B[m,n], follow pointers Extract entries with DIAG ( ) LCS for above example: BCB
ExtractLCS(B, X, i, j) { //initially called with (B, X, m, n) if i==0 OR j== 0 return; if B[i,j] == DIAG ExtractLCS(B, X, i-1, j-1); print X[i] else if B[i,j] == UP extractLCS(B, X, i-1, j); else // LEFT extractLCS(B, X, i, j-1) }
Example of LCF • https://www.youtube.com/watch?v=NnD96abizww