Exploring Longest Common Subsequence Problem
90 likes | 285 Views
Understand the efficient algorithms for finding the longest common subsequence between two strings. Learn about dynamic programming, subsequence recovery, and practical applications in text editing and document comparison.
Exploring Longest Common Subsequence Problem
E N D
Presentation Transcript
The Longest Common Subsequence Problem CSE 373 Data Structures
Reading Goodrich and Tamassia, 3rd ed, Chapter 12, section 11.5, pp.570-574. CSE 373 AU 04 -- Longest Common Subsequences
Motivation • Two Problems and Methods for String Comparison: • The substring problem • The longest common subsequence problem. • In both cases, good algorithms do substantially better than the brute force methods. CSE 373 AU 04 -- Longest Common Subsequences
String Matching Problem • Given two strings TEXT and PATTERN, find the first occurrence of PATTERN in TEXT. • Useful in text editing, document analysis, genome analysis, etc. CSE 373 AU 04 -- Longest Common Subsequences
String Matching Problem:Brute-Force Algorithm For i = 0 to n – m { For j = 0 to m – 1 { If TEXT[j] PATTERN[i] then break If j = m – 1 then return i } return -1; } Suppose TEXT = 0000000000001 PATTERN = 0000001 This type of problem has (n2) behavior. A more efficient algorithm is the Boyer-Moore algorithm. (We will not be covering it in this course.) CSE 373 AU 04 -- Longest Common Subsequences
Longest Common Subsequence Problem • A Longest Common Subsequence LCS of two strings S1 and S2 is a longest string the can be obtained from S1 and from S2 by deleting elements. • For example, S1 = “thoughtful” and S2 = “shuffle” have an LCS: “hufl”. • Useful in spelling correction, document comparison, etc. CSE 373 AU 04 -- Longest Common Subsequences
Dynamic Programming • Analyze the problem in terms of a number of smaller subproblems. • Solve the subproblems and keep their answers in a table. • Each subproblem’s answer is easily computed from the answers to its own subproblems. CSE 373 AU 04 -- Longest Common Subsequences
Longest Common Subsequence:Algorithm using Dynamic Programming • For every prefix of S1 and prefix of S2 we’ll compute the length L of an LCS. • In the end, we’ll get the length of an LCS for S1 and S2 themselves. • The subsequence can be recovered from the matrix of L values. • (see demonstration) CSE 373 AU 04 -- Longest Common Subsequences