Dynamic Programming for Longest Common Subsequence

Dynamic Programming Longest Common Subsequence

Dynamic Programming(DP) • A powerful design technique for optimization problems • Related to divide and conquer • However, due to the nature of DP problems, standard divide-and-conquer solution are not efficient

Dynamic Programming(DP) • Main question: How to set up the subproblem structure? • For DP to be applicable to an optimization problem • Optimal substructure: for the global problem to be solved optimally, each subproblem should be solved optimally • Polynomially many subproblems • Overlapping subproblems

Longest Common Subsequence (LCS) • Application: searching for a substring or pattern in a large piece of text • Not necessarily exact text but something similar • Method for measuring degree of similarity: LCS

Longest Common Subsequence (LCS) • Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A

Longest Common Subsequence (LCS) • Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. x: A B C B D A B y: B D C A B A LCS(x,y) = BCBA

LCS • Not always unique X : A B C Y: B A C

Brute Force Solution? • Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . .n]

Brute Force Solution? • Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . .n] Analysis • Checking = O(n) time per subsequence. • subsequences of x(each bit-vector of length m determines a distinct subsequence of x). Thus, Exponential time.

DP Formulation for LCS • Subproblems? • Consider all pairs of prefixes • A prefix of a sequence is just an initial string of characters • Let denote the prefix of X with i characters • Let denote the empty sequence x: A B C B D A B

DP Formulation • Compute the LCS for every possible pair of prefixes • C[i,j]: length of the LCS of and • Then, C[m,n]: length of LCS of X and Y x: A B C B D A B y: B D C A B A

Recursive formulation • C[i,0] = C[0,j] = 0 // base case • How can I compute C[i,j] using solutions to subproblems?

Recursive formulation Two cases • Last characters match: if X[i] = Y[j] LCS must end with this same character x: A B C B D A B y: B D C A B A i X A j Y A

Recursive formulation Two cases • Last characters match: if X[i] = Y[j] LCS must end with this same character C[i,j] = C[i-1, j-1] + 1 LCS ( ) = LCS( ) + X[i] i X A j Y A

i i i X X X B B B j j j Y Y Y A A A Recursive formulation (2) Last characters DO NOT match: LCS( ) //skip x[i] LCS( ) //skip Y[j] C[i,j] = max (C[i, j-1], C[i-1,j])

A recursive algorithm LCS(x, y, i, j) { if x[i] = y[ j ] then c[i, j ] ←LCS(x, y, i–1, j–1) + 1 else c[i, j ] ←max{LCS(x, y, i–1, j), LCS(x, y, i, j–1) } • Worst-case:x[i] ≠y[ j ], in which case the algorithm evaluates two subproblems, each with only one parameter decremented.

3,4 2,4 3,3 1,4 2,3 2,3 3,2 Recursion tree • Height = O(m+n) • Running time = potentially exponential!

BUT, we keep recomputing the same subproblems 3,4 2,4 3,3 1,4 2,3 2,3 3,2

Overlapping subproblems • A recursive solution contains a “small” number of distinct subproblems repeated many times • The number of distinct LCS subproblems for two strings of lengths m andnis onlymn

Store the solutions of subproblems • After computing a solution to a subproblem, store it in a table. • C[0..m,0..n] that stores lengths of LCS • Keep also a helper array B[0..m,0..n] to store some pointers to extract the LCS later

LCS(x[1..m],y[1..n]) { int C[0..m,0..n]; int B[0..m,0..n] for i=0 to m C[i,0] = 0; B[i,0] = UP for j=0 to n C[0,j] = 0; B[0,j] = LEFT for i=1 to m for j=1 to n if x[i] == y[j] C[i,j]=c[i-1, j-1] +1; B[i,j] = DIAG; else if (C[i-1,j] >= C[i, j-1]) C[i,j]=C[i-1, j]; B[i,j] = UP; else C[i,j]=C[i, j-1]; B[i,j] = LEFT; return C[m,n]; } O(mn)

B D C B 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 1 2 2 0 1 2 2 2 0 1 2 2 3 B A C D B Compute tables bottom-up Y Y B D C B 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 1 2 2 0 1 2 2 2 0 1 2 2 3 B A X X C D B Length Table: C Tables C and B

B D C B 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 1 2 2 0 1 2 2 2 0 1 2 2 3 B A C D B Compute tables bottom-up Y Y B D C B 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 1 2 2 0 1 2 2 2 0 1 2 2 3 B A X X C D B Length Table: C Tables C and B Start from B[m,n], follow pointers Extract entries with DIAG ( ) LCS for above example: BCB

ExtractLCS(B, X, i, j) { //initially called with (B, X, m, n) if i==0 OR j== 0 return; if B[i,j] == DIAG ExtractLCS(B, X, i-1, j-1); print X[i] else if B[i,j] == UP extractLCS(B, X, i-1, j); else // LEFT extractLCS(B, X, i, j-1) }

Example of LCF • https://www.youtube.com/watch?v=NnD96abizww

Dynamic Programming for Longest Common Subsequence

Dynamic Programming for Longest Common Subsequence

Presentation Transcript

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming