Synthesizable, Space and Time Efficient Algorithms for String Editing Problem.

1 / 46

# Synthesizable, Space and Time Efficient Algorithms for String Editing Problem. - PowerPoint PPT Presentation

## Synthesizable, Space and Time Efficient Algorithms for String Editing Problem.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Synthesizable, Space and Time Efficient Algorithms for String Editing Problem. Vamsi K. Kundeti

2. Agenda. • Synthesizable: • Digital circuit to implement edit distance in hardware. • High speed and area efficient • Space and Time efficient algorithms: • Computing the edit script and edit distance in time O(n2/log(n)) and O(n) space.

3. Edit Distance Optimization Problem

4. Edit Distance in hardware. • Related work. • Parallel systolic array based designs. • Issues with systolic arrays. • e.g. [lipton86] , [lopresti87] & [sastry95] • Sequential design. • Area efficient and high speed. • Adding edit distance to instruction set of general CPU. • Speedup by reduction in constants.

5. Basic idea behind systolic arrays Entries computed In parallel. PE-7 PE-7 Entries computed By a single processor PE-6 Linear array. PE-5 PE-5 PE-1 PE-2 PE-3 PE-4

6. Basic idea behind systolic arrays Can be computed in parallel T = x Entries computed In parallel. PE-7 PE-7 Entries computed By a single processor PE-6 PE-5 PE-5 PE-1 PE-2 PE-3 PE-4

7. Basic idea behind systolic arrays T = x+1 T = x+2 Entries computed In parallel. PE-7 PE-7 Entries computed By a single processor PE-6 PE-5 PE-5 PE-1 PE-2 PE-3 PE-4

8. Systolic Array Issues S1 = [abc] , S2 = [bca] a_b_c b_c_a pe-1 pe-2 pe-3 pe-4 pe-5 b c a 1. pe-2 , pe-4 has to wait until pe-1 is done (synchronous) 2. pe-3 does more computation than others a b c 3. Increased IO complexity pe-5 pe-4 pe-3 pe-2 pe-1

9. Systolic Array Problems. • Pros: • Need only O(n) steps to compute edit distance • Cons: • Design is too complex. • Although we need only O(n) time we pay big price. • Clock Speed Reduction: The design needs a clock with large time period, so can only give speed in MHz. This is due to synchronous nature of design • [sastry95] design is only 80MHz speed. • Increased Area, redundancy in form of PE’s doing less work. • I/O bandwidth limits the cost model, constraints the cost of operations under a range. • Needs custom hardware and limits the usage of hardware. • Issues with the systolic arrays makes their usage very limited.

10. Motivation behind our work. • CPU’s are every where • servers, desktops, laptops etc… • Almost all the Bio-Informatics software runs on general CPU’s rather than custom hardware (systolic arrays). • Can we add edit distance instruction to the processor instruction set ? • This can really help software by reducing the constants in asymptotic complexity.

11. Our Contribution. • Key idea behind our design • “Can we compute edit distance using exactly n+2 memory locations” • We know if that if we need to compute only edit distance we just need to keep track of two rows which is 2n memory locations.

12. Basic Idea behind our algorithm. T = x Needed for further computation. Just Computed.

13. Basic Idea behind our algorithm. T = x+1 Needed for further computation. Computed in previous step Redundant Just Computed

14. Basic Idea behind our algorithm. T = x+1 Needed for further computation. Computed in previous step Redundant Just Computed

15. Basic Idea behind our algorithm. T = x+2 Needed for further computation. Computed in previous step Redundant Just Computed

16. Basic Idea behind our algorithm. T = x+2 Needed for further computation. Computed in previous step Redundant Just Computed

17. Basic Idea behind our algorithm. T = x+2 Needed for further computation. Computed in previous step Redundant Just Computed

18. Basic Idea behind our algorithm. T = x+2 Needed for further computation. Computed in previous step Redundant Just Computed

19. Basic Idea behind our algorithm.

20. Basic Idea behind our algorithm.

21. Basic Idea behind our algorithm.

22. Basic Idea behind our algorithm.

23. Basic Idea behind our algorithm. Shift register of size n+2 Elements are shifted in as they are computed. And redundant elements shifted out.

24. Top Level Circuit Diagram

25. Design Block: AlgoShifter

26. Design Block: ComputeBlock

27. Design Block: CounterBlock.

28. Verification Simulation-ex1

29. Verification Simulation ex-2

30. Edit Distance Instruction. If we have a t x t edit distance instruction we spend only O(n2/ t2) time in software , thus this instruction is helpful in reducing the constants and speed-up edit distance computation.

31. Design Metrics.

32. PART-2: Space and Time Efficient Algorithms for Edit Distance. • Brief overview of Four Russian Algorithm [russian70]. • Brief overview of Hirschberg’s Algorithm [hirschberg75]. • Algorithm to compute edit distance and edit script in O(n2/log(n)) time and O(n) space.

33. The Four Russian Algorithm. Spend only O(t) time to compute the entries in each block t-block Row Overlap • n2/t2 blocks • idea is to do some pre processing to spend only O(t) time per block • runtime O(n2/t) Column Overlap

34. Four Russian Algorithm • In unit cost model the following is true • | D[i+1,j] – D[i,j] | <= 1 (across col) • | D[i,j+1] – D[i,j] | <= 1 (across row) • This helps us in characterizing any t-block by two vectors of size t. • The vectors will have only {-1,0,1} • e.g [0,1,2,3,….n] can be replaced by vector [0,1,1,1,….n]

35. Look Up table for t-block D = [_aaaa] B = [0,1,1,1,1] C = [_aaab] F=[0,-1,-1,-1,0] A = [0,1,1,1,1] E=[0,-1,-1,-1,0] • Preprocessing time • O(3tΣtt2) [E,F] = table(A,B,C,D)

36. Hirschberg’s Dynamic Programming formulation. align (a1 a2 ….an-1) an (a1 a2 ….an-1 an) (b1 b2 ….bn-1) bn (b1 b2 ….bn-1 bn) Standard DP (a1 a2…an/2 ) (an-1…an) (a1 a2…an/2 ) (an-1…an) (a1 a2…an/2 ) (an-1…an) …….. (.) (……………) (..) (…………) (…) (………)

37. Hirschberg's Algorithm runtime.

38. Our Algorithm. • In hirschberg’s algorithm we spend O(n2) time to compute D[n/2,*] and Dr[n/2,*]. • Can we use the Four Russian framework to Compute D[n/2,*] and Dr[n/2,*] in time O(n2/log(n)) O(n) space?

39. Using Four Russian Framework at each level Space Usage D[n/2-1,*] Dr[n/2-1,*]

40. Using Four Russian Framework at each level Space Usage

41. Using Four Russian Framework at each level Space Usage Spend Only O(n2/t) time to compute D[n/2,*] and Dr[n/2,*]

42. Using Four Russian Framework at each level Space Usage Spend Only O(n2/t) time to compute D[n/2,*] and Dr[n/2,*]

43. Cases which require row k which is not a multiple of t Space Usage Required this row k Use Four Russian framework till FLOOR(k) spend at most O(nt) time to compute row k. However O(n2/t2) dominates

44. Runtime and Space Analysis. • Space: • Space during the core algorithm, which we saw is linear. • Space to hold the lookup table after the preprocessing. then the space required would be linear for lookup table

45. References. [sastry95] R. Sastry, N. Ranganathan, and K. Remedios. CASM: A VLSI chip for approximate string matching. IEEE Trans. Pattern Anal. Mach. Intell., 17(8):824–830, 1995. [lopresti87] D. P. Lopresti. P-NAC: A systolic array for comparing nucleic acid sequences. Computer, 20(7):98–99, 1987. [lipton85] R. J. Lipton and D. Lopresti. A systolic array for rapid string comparison. In Chapel Hill Conf. on VLSI, pages 363–376, 1985. [russian70] V. L. Arlazarov, E. A. Dinic, M. A. Kronrod, and I. A. Faradzev. On economic construction of the transitive closure of a directed graph. Dokl. Akad. Nauk SSSR, 194:487–488, 1970. [hirschberg75] D. S. Hirschberg. Linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.