Synthesizable, Space and Time Efficient Algorithms for String Editing Problem.

Synthesizable, Space and Time Efficient Algorithms for String Editing Problem. Vamsi K. Kundeti

Agenda. • Synthesizable: • Digital circuit to implement edit distance in hardware. • High speed and area efficient • Space and Time efficient algorithms: • Computing the edit script and edit distance in time O(n2/log(n)) and O(n) space.

Edit Distance Optimization Problem

Edit Distance in hardware. • Related work. • Parallel systolic array based designs. • Issues with systolic arrays. • e.g. [lipton86] , [lopresti87] & [sastry95] • Sequential design. • Area efficient and high speed. • Adding edit distance to instruction set of general CPU. • Speedup by reduction in constants.

Basic idea behind systolic arrays Entries computed In parallel. PE-7 PE-7 Entries computed By a single processor PE-6 Linear array. PE-5 PE-5 PE-1 PE-2 PE-3 PE-4

Basic idea behind systolic arrays Can be computed in parallel T = x Entries computed In parallel. PE-7 PE-7 Entries computed By a single processor PE-6 PE-5 PE-5 PE-1 PE-2 PE-3 PE-4

Basic idea behind systolic arrays T = x+1 T = x+2 Entries computed In parallel. PE-7 PE-7 Entries computed By a single processor PE-6 PE-5 PE-5 PE-1 PE-2 PE-3 PE-4

Systolic Array Issues S1 = [abc] , S2 = [bca] a_b_c b_c_a pe-1 pe-2 pe-3 pe-4 pe-5 b c a 1. pe-2 , pe-4 has to wait until pe-1 is done (synchronous) 2. pe-3 does more computation than others a b c 3. Increased IO complexity pe-5 pe-4 pe-3 pe-2 pe-1

Systolic Array Problems. • Pros: • Need only O(n) steps to compute edit distance • Cons: • Design is too complex. • Although we need only O(n) time we pay big price. • Clock Speed Reduction: The design needs a clock with large time period, so can only give speed in MHz. This is due to synchronous nature of design • [sastry95] design is only 80MHz speed. • Increased Area, redundancy in form of PE’s doing less work. • I/O bandwidth limits the cost model, constraints the cost of operations under a range. • Needs custom hardware and limits the usage of hardware. • Issues with the systolic arrays makes their usage very limited.

Motivation behind our work. • CPU’s are every where • servers, desktops, laptops etc… • Almost all the Bio-Informatics software runs on general CPU’s rather than custom hardware (systolic arrays). • Can we add edit distance instruction to the processor instruction set ? • This can really help software by reducing the constants in asymptotic complexity.

Our Contribution. • Key idea behind our design • “Can we compute edit distance using exactly n+2 memory locations” • We know if that if we need to compute only edit distance we just need to keep track of two rows which is 2n memory locations.

Basic Idea behind our algorithm. T = x

Basic Idea behind our algorithm. T = x Needed for further computation. Just Computed.

Basic Idea behind our algorithm. T = x+1 Needed for further computation. Computed in previous step Redundant Just Computed

Basic Idea behind our algorithm. T = x+2 Needed for further computation. Computed in previous step Redundant Just Computed

Basic Idea behind our algorithm.

Basic Idea behind our algorithm. Shift register of size n+2 Elements are shifted in as they are computed. And redundant elements shifted out.

Top Level Circuit Diagram

Design Block: AlgoShifter

Design Block: ComputeBlock

Design Block: CounterBlock.

Verification Simulation-ex1

Verification Simulation ex-2

Edit Distance Instruction. If we have a t x t edit distance instruction we spend only O(n2/ t2) time in software , thus this instruction is helpful in reducing the constants and speed-up edit distance computation.

Design Metrics.

PART-2: Space and Time Efficient Algorithms for Edit Distance. • Brief overview of Four Russian Algorithm [russian70]. • Brief overview of Hirschberg’s Algorithm [hirschberg75]. • Algorithm to compute edit distance and edit script in O(n2/log(n)) time and O(n) space.

The Four Russian Algorithm. Spend only O(t) time to compute the entries in each block t-block Row Overlap • n2/t2 blocks • idea is to do some pre processing to spend only O(t) time per block • runtime O(n2/t) Column Overlap

Four Russian Algorithm • In unit cost model the following is true • | D[i+1,j] – D[i,j] | <= 1 (across col) • | D[i,j+1] – D[i,j] | <= 1 (across row) • This helps us in characterizing any t-block by two vectors of size t. • The vectors will have only {-1,0,1} • e.g [0,1,2,3,….n] can be replaced by vector [0,1,1,1,….n]

Look Up table for t-block D = [_aaaa] B = [0,1,1,1,1] C = [_aaab] F=[0,-1,-1,-1,0] A = [0,1,1,1,1] E=[0,-1,-1,-1,0] • Preprocessing time • O(3tΣtt2) [E,F] = table(A,B,C,D)

Hirschberg’s Dynamic Programming formulation. align (a1 a2 ….an-1) an (a1 a2 ….an-1 an) (b1 b2 ….bn-1) bn (b1 b2 ….bn-1 bn) Standard DP (a1 a2…an/2 ) (an-1…an) (a1 a2…an/2 ) (an-1…an) (a1 a2…an/2 ) (an-1…an) …….. (.) (……………) (..) (…………) (…) (………)

Hirschberg's Algorithm runtime.

Our Algorithm. • In hirschberg’s algorithm we spend O(n2) time to compute D[n/2,*] and Dr[n/2,*]. • Can we use the Four Russian framework to Compute D[n/2,*] and Dr[n/2,*] in time O(n2/log(n)) O(n) space?

Using Four Russian Framework at each level Space Usage D[n/2-1,*] Dr[n/2-1,*]

Using Four Russian Framework at each level Space Usage

Using Four Russian Framework at each level Space Usage Spend Only O(n2/t) time to compute D[n/2,*] and Dr[n/2,*]

Cases which require row k which is not a multiple of t Space Usage Required this row k Use Four Russian framework till FLOOR(k) spend at most O(nt) time to compute row k. However O(n2/t2) dominates

Runtime and Space Analysis. • Space: • Space during the core algorithm, which we saw is linear. • Space to hold the lookup table after the preprocessing. then the space required would be linear for lookup table

References. [sastry95] R. Sastry, N. Ranganathan, and K. Remedios. CASM: A VLSI chip for approximate string matching. IEEE Trans. Pattern Anal. Mach. Intell., 17(8):824–830, 1995. [lopresti87] D. P. Lopresti. P-NAC: A systolic array for comparing nucleic acid sequences. Computer, 20(7):98–99, 1987. [lipton85] R. J. Lipton and D. Lopresti. A systolic array for rapid string comparison. In Chapel Hill Conf. on VLSI, pages 363–376, 1985. [russian70] V. L. Arlazarov, E. A. Dinic, M. A. Kronrod, and I. A. Faradzev. On economic construction of the transitive closure of a directed graph. Dokl. Akad. Nauk SSSR, 194:487–488, 1970. [hirschberg75] D. S. Hirschberg. Linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Synthesizable, Space and Time Efficient Algorithms for String Editing Problem.