1 / 90

# 資料結構與演算法 ( 上 ) - PowerPoint PPT Presentation

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## 資料結構與演算法 ( 上 )

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
1. 資料結構與演算法(上) 呂學一 (Hsueh-I Lu) http://www.csie.ntu.edu.tw/~hil/ Data Structures and Algorithms (I)

2. Outline of this slide • Dynamic programming • Fibonacci sequence • Stamp problem • Sequence alignment • Matrix multiplication Data Structures and Algorithms (I)

3. Leornardo Fibonacci1170-1250 Data Structures and Algorithms (I)

4. Old hens never die …They just lay eggs! • At the beginning of Day 1, there is a hen. • Each hen lays an egg every 24 hours. • Each egg takes 24 hours to become a hen. • F(n) = the number of hens at the end of day n. • Give an algorithm to compute F(n). Data Structures and Algorithms (I)

5. Day 1 Data Structures and Algorithms (I)

6. Day 2 Data Structures and Algorithms (I)

7. Day 3 Data Structures and Algorithms (I)

8. Day 4 Data Structures and Algorithms (I)

9. Day 5 Data Structures and Algorithms (I)

10. E(n) = the number of eggs at the end of Day n • E(n) = ? • F(n) = ? Data Structures and Algorithms (I)

11. ( ) ( ) F F 1 2 1 = = . The recurrence relation Data Structures and Algorithms (I)

12. The recursive algorithm Data Structures and Algorithms (I)

13. ( ( ) ) ( ) k I O F i i n 1 6 1 8 0 3 3 9 9 t t t t t a e s m e o c o m p u e n u s n g : h l h i i t t e r e c u r s v e a g o r m . 3 6 5 7 6 1 6 1 8 0 0 1 8 9 3 0 3 5 7 1 1 0 £ = : : 3 0 1 6 1 8 0 0 1 8 5 9 3 2 5 9 = : : Very Inefficient! F(8) F(7) F(6) F(6) F(5) F(5) F(4) F(5) F(4) F(4) F(3) F(4) F(3) F(3) F(2) Data Structures and Algorithms (I)

14. Dynamic-Programming Approach The DP-algorithm takes only O(n) time and space! Data Structures and Algorithms (I)

15. Illustration Data Structures and Algorithms (I)

16. Dynamic Programming • A clever way to implement recursion: • Using storage to avoid unnecessarily duplicated efforts. • 讓走過的留下痕跡 Data Structures and Algorithms (I)

17. Question • 有沒有可能維持線性的時間，卻將空間降低到O(1)? Data Structures and Algorithms (I)

18. Another example Choosing stamps Data Structures and Algorithms (I)

19. The problem • If the postage is n, what is the minimum number of stamps to cover the postage? Data Structures and Algorithms (I)

20. A recursive algorithm Data Structures and Algorithms (I)

21. The DP-version The DP-algorithm takes only O(n) time and space! Data Structures and Algorithms (I)

22. Illustration Data Structures and Algorithms (I)

23. Question • 剛剛只是問幾張郵票. • 如果我們想要知道最少張郵票的貼法，究竟是每一種面額的郵票各幾張，應該如何處理？需要額外再花空間嗎？ Data Structures and Algorithms (I)

24. Sequence Alignment Data Structures and Algorithms (I)

25. Aligning two strings • A = attgatcctag • B = acttagtccttcgc • A → a-ttga-tcc-tag- • B → actt-agtccttcgc gap gap gap gap gap Data Structures and Algorithms (I)

26. Measuring an alignment Scoring matrix Data Structures and Algorithms (I)

27. BLAST matrix Transition/Transversion matrix Other scoring matrices Data Structures and Algorithms (I)

28. Scoring matrix is an art • Log odds matrix • score[i, j] = log (q(i, j) / p(i) p(j)). • PAM matrix • Point accepted mutations • BLOSOM matrix • Block substitution matrix • Steven Henikoff and Jorja G. Henikoff (1992). • Other specialized scoring matrices • Domenico Bordo and Patrick Argos (1991). • Jean-Michael Claverie (JCB 1993). • Lee F. Kowlakowski and Kenneth A. Rice (Nature 1994) Data Structures and Algorithms (I)

29. Scoring an alignment • a – t t g a – t c c – t a g - • c c t t – a g t c c t t cg c -2-1+2+2-1+2-1+2+2+2-1+2-2+2-1 • score = 7 Data Structures and Algorithms (I)

30. String alignment problem • Input: • two strings A and B; and • a scoring table 分. • Output: • an alignment of A and B that has the maximum score with respect to 分. Data Structures and Algorithms (I)

31. Q: Any naïve methods? • A = attgatcctag • B = ccttagtccttcgc Data Structures and Algorithms (I)

32. Q: Is there a recursive method? • A = attgatcctag • B = ccttagtccttcgc Data Structures and Algorithms (I)

33. ( ) f l i i t n a g n m n ; ( ) f i 0 0 t m n r e u r n ; = = l t e x y z 1 ; = = = ( ) f d i 0 0 > > m a n n ( ) [ [ ] [ ] ] l l S A B i 1 1 t ¡ ¡ + e x a g n m n c o r e m n ; = ; ; ( ) f i 0 > m ( ) [ [ ] ] l l S A i 1 t ¡ + ¡ e y a g n m n c o r e m ; = ; ; ( ) f i 0 > n ( ) [ [ ] ] l l S B i 1 t ¡ + ¡ e z a g n m n c o r e n ; = ; ; ( ) t r e u r n m a x x y z ; ; ; g Yes, but very inefficient! Data Structures and Algorithms (I)

34. c c t t a g t c a t t g a Alignment graph Data Structures and Algorithms (I)

35. Each alignment corresponds to a maximal path on the alignment graph. The score of an alignment is the score of its corresponding maximal path. c c t t a g t c a t t g a Observations 前無古人 後無來者 c c t t - a g t c a - t t g a - - - Data Structures and Algorithms (I)

36. Score of edges B[j] 分[-, B[j]] 分[A[i], -] 分[A[i], B[j]] A[i] Data Structures and Algorithms (I)

37. The graph problem Finding a maximal path with maximum score on the alignment graph (a directed acyclic graph) Data Structures and Algorithms (I)

38. For each i = 0, 1,…, |A| and each j = 0, 1,…, |B|, let 點[i, j] keep the maximum score of aligning A[1…i] and B[1…j]. Idea j 0 1 |B| B[j] 0 1 A[i] i |A| Data Structures and Algorithms (I)

39. 點[i, j] = the maximum of 點[i-1, j-1] + 分[A[i], B[j]] 點[i-1, j] + 分[A[i], -] 點[i, j-1] + 分[-, B[j]] 點[i-1, j-1] 點[i-1, j] 分[A[i], B[j]] 分[A[i], -] 點[i, j-1] 點[i, j] 分[-, B[j]] An observation Data Structures and Algorithms (I)

40. For example c c t t a g t c 0 -1 -2 -3 -4 -5 -6 -7 -8 a -1 -2 -3 -4 -5 -2 -3 -4 -5 t -2 -3 -4 -1 -2 -3 -4 -1 -2 t -3 -4 -5 -2 1 0 -1 -2 -3 g -4 -5 -6 -3 0 -1 2 1 0 a -5 -6 -7 -4 -1 2 1 0 -1 Data Structures and Algorithms (I)

41. ( ) f l i i t n a g n m n ; [ ] l C 0 0 0 t e ; = ; [ ] [ ] [ [ ] ] f l C C S A i i i i 1 0 1 0 t t ¡ + ¡ o r o m e c o r e ; = = ; ; ; [ ] [ ] [ [ ] ] f l C C S S B j j j j 1 0 0 1 t t ¡ + ¡ o r o n e c o r e ; = = ; ; ; f i 1 t o r o m = f f j 1 t o r o n = [ ] [ [ ] [ ] ] l C S A B i j i j 1 1 t ¡ ¡ + e x c o r e ; = ; ; [ ] [ [ ] ] l C S A i j i 1 t ¡ + ¡ e y c o r e ; = ; ; [ ] [ [ ] ] l C S B i j j 1 t ¡ + ¡ e z c o r e ; = ; ; [ ] ( ) l C i j t e m a x x y z ; = ; ; ; g [ ] C t r e u r n m n ; ; g The DP-version. Data Structures and Algorithms (I)

42. Complexity • Space = O(|A|×|B|). • Each node keeps a score and a pointer, and thus requires only O(1) space. • Time = O(|A|×|B|). • The content of each node can be obtained from those of at most three nodes in O(1) time. Data Structures and Algorithms (I)

43. Question • 剛剛只是算出最佳的成績. • 如果我們想要知道得到這個最佳成績的alignment應該如何處理？需要額外再花空間嗎？ Data Structures and Algorithms (I)

44. For example c c t t a g t c 0 -1 -2 -3 -4 -5 -6 -7 -8 a -1 -2 -3 -4 -5 -2 -3 -4 -5 t -2 -3 -4 -1 -2 -3 -4 -1 -2 t -3 -4 -5 -2 1 0 -1 -2 -3 g -4 -5 -6 -3 0 -1 2 1 0 回顧來時徑 a -5 -6 -7 -4 -1 2 1 0 -1 Data Structures and Algorithms (I)

45. Complexity • Space = O(|A|×|B|). • Each node keeps a score and a pointer, and thus requires only O(1) space. • Time = O(|A|×|B|). • The content of each node can be obtained from those of at most three nodes in O(1) time. Data Structures and Algorithms (I)

46. Challenge Reducing the space complexity Data Structures and Algorithms (I)

47. First attempt c c t t a g t c 0 -1 -2 -3 -4 -5 -6 -7 -8 a -1 -2 -3 -4 -5 -2 -3 -4 -5 t -2 -3 -4 -1 -2 -3 -4 -1 -2 t -3 -4 -5 -2 1 0 -1 -2 -3 g What is the problem? -4 -5 -6 -3 0 -1 2 1 0 a -5 -6 -7 -4 -1 2 1 0 -1 Data Structures and Algorithms (I)

48. Knowing the maximum score, but … Not knowing the corresponding alignment Data Structures and Algorithms (I)

49. Q: Can we deduce an optimal alignment from the optimal score? Data Structures and Algorithms (I)

50. An optimal path passes 點[i, j] if and only if 分(A, B) is the sum of 分(A[1…i], B[1…j]) and 分(A[i+1…|A|], B[j+1…|B|]). c c t t a g t c a t t g a A key observation Data Structures and Algorithms (I)