90 likes | 217 Views
This paper expands on the String-to-String Correction Problem by introducing the edit operation of swapping adjacent characters alongside traditional operations such as substitution, insertion, and deletion. The goal is to determine the minimal number of edit operations required to transform a string T into another string P. The study presents a dynamic programming algorithm that efficiently computes this minimum by employing a trace method to visualize the sequence of operations applied. It discusses various cost scenarios for edits and offers insights into optimizing the transformation process.
E N D
An Extension of the String-to-String Correction ProblemRoy Lowrance and Robert A. WagnerJournal of the ACM, vol. 22, No. 2, April 1975, pp. 177-183. Speaker: 吳展碩
Edit Distance • Three edit operations: • Substitution • abcd -> aacd (changeb toa) • Insertion • abcd -> abacd (insert ana) • Deletion • abcd -> abd (deletec) • Given two strings T and P, The problem is to determine the minimum number of edit operations to transform T into P. Note: For clarity, we consider the cost of all edit operations are same.
d[i, j] = min( d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + cost(A[i]->B[j]) ) This example is copied from Wikipedia
The Problem • This paper extends the set of edit operations to include the operation of interchanging two adjacent characters. • Swap • Example: T: a b c d P: c d a a b c d -> a c d -> c a d -> c d a
T: a b c d P: c d a Trace • A trace is a graphical specification of how edit operations apply to each character in the two strings. • Example:
Important Properties • The edit operations in following cases can be substituted by other edit operations.
2 swaps insertion + deletion or deletion + substitution 2 substitution swap + substitution K L swap+Kdeletion+Linsertion a trace with lower cost
The Algorithm i’ i j’ j d[i, j] = min( d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + cost(A[i]->B[j]), d[i'-1, j'-1] + (i-i'-1) + (j-j'-1) + 1 )
Summary • With a simple preprocessing on |T| and |P|, then the problem can be solved by dynamic programming in time O(|T||P|). • If we allow edit operations to have different cost Insertion (cost WI) Deletion (cost WD) Swap (cost WS) Substitution (cost WC) then the algorithm works if 2 WS ≥ WI + WD.