1 / 25

Sequence analysis of nucleic acids and proteins: part 1

Sequence analysis of nucleic acids and proteins: part 1. Similarity search Lecture by Terry Speed. Based on Chapter 3 of Post-genome Bioinformatics by Minoru Kanehisa, Oxford University Press, 2000.

omer
Download Presentation

Sequence analysis of nucleic acids and proteins: part 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence analysis of nucleic acids and proteins: part 1 Similarity search Lecture by Terry Speed Based on Chapter 3 of Post-genome Bioinformatics by Minoru Kanehisa, Oxford University Press, 2000

  2. Pairwise sequence alignment by the dynamic programming algorithm. The algorithm involves finding the optimal path in the path matrix. (a), which is equivalent to searching the optimal solution in the search tree (b). (a) Path Matrix(b) Search Tree A I M S A M O S X X . . . . . . . . . . . . . . Alignment AIM-S A-MOS Pruning by an optimization function

  3. Di, j-l Di, j-l Di-1, j-1 Di-1, j Di-1, j Methods for computing the optimal score in the dynamic programming algorithm (a ) the gap penalty is a constant. (b) the gap penalty is a linear function of the gap length. (a) (b) Di-1, j-1 d ws(i), t(j) d b Di, j(2) Di,j ws(i), t(j) b Di,j(1) Di,j(3)

  4. 0 0 . . . . . . 0 . . . . 0 Concepts of global and local optimality in the pairwise sequence alignment. The distinction is made as to how the initial values are assigned to the path matrix. (a) Global vs. Global (b) Local vs. Global 0 0 . . . . . . 0 0 (c) Local vs. Local 0 0 . . . . . . 0 . . . . 0 X

  5. Dynamic programming to find edit distances - Edit operation: M, R, I, D - Edit transcript: A string over the alphabet M, R, I, D that describes a transformation of one string into another. Example: R D I M D M M A - T H S A - R T - S - Edit (Levens(h)tein) distance: The minimum number of edit operations necessary to transform one string into another. (Note: matches are not counted.) Example: R D I M D M 1+ 1+ 1+ 0+ 1+ 0 = 4

  6. The recurrence - Stage: position in the edit transcript; - State: I, D, M, or R; - Optimal value function: D(i, j) where D(i, j) = edit distance of Seq1[1...i] and Seq2[1...j] - Recurrence relation: 1 +D(i-1, j) D(i, j) = min 1 +D(i, j-1) t(i, j) +D(i-1, j-1) , where t(i, j) = {

  7. The tabulation , D(i, j) Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 M 1 A 2 T 3 H 4 S 5

  8. The tabulation , D(i, j) Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 M 1 A 2 T 3 H 4 S 5

  9. The tabulation , D(i, j) Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 M 1 A 2 T 3 H 4 S 5

  10. The tabulation , D(i, j) Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2 M 1 A 2 T 3 H 4 S 5

  11. The tabulation , D(i, j) Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2 3 4 M 1 1 A 2 2 T 3 3 H 4 4 S 5 5

  12. The tabulation , D(i, j) Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2 3 4 M 1 1 1 A 2 2 T 3 3 H 4 4 S 5 5

  13. The tabulation , D(i, j) Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2 3 4 M 1 1 1 2 A 2 2 T 3 3 H 4 4 S 5 5

  14. The tabulation , D(i, j) Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2 3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 H 4 4 S 5 5

  15. The tabulation , D(i, j) Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2 3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4 4 S 5 5

  16. The tabulation , D(i, j) Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2 3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4 4 3 3 3 3 S 5 5 4 4 4 3

  17. The traceback Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2 3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4 4 3 3 3 3 S 5 5 4 4 4 3

  18. The solutions - #1 1 0 1 1 0 = 3 D M R R M M A T H S - A R T S

  19. The traceback Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2 3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4 4 3 3 3 3 S 5 5 4 4 4 3

  20. The solutions - #2 1 0 1 0 1 0 = 3 D M I M D M M A - T H S - A R T - S

  21. The traceback Seq2(j) A R T S Seq1(i) 0 1 2 3 4 0 0 1 2 3 4 M 1 1 1 2 3 4 A 2 2 1 2 3 4 T 3 3 2 2 2 3 H 4 4 3 3 3 3 S 5 5 4 4 4 3

  22. The solutions - #3 1 1 0 1 0 = 3 R R M D M M A T H S A R T - S “Life must be lived forwards and understood backwards.” - Søren Kierkegaard

  23. BLOSUM62 SCORING MATRIX 134 LQQGELDLVMTSDILPRSELHYSPMFDFEVRLVLAPDHPLASKTQITPEDLASETLLI | ||| | | |||||| | || || 137 LDSNSVDLVLMGVPPRNVEVEAEAFMDNPLVVIAPPDHPLAGERAISLARLAEETFVM D:D = +6 D:R = -2 From Henikoff 1996

  24. Scoring Matrices • Physical/Chemical similarities • comparing two sequences according to the properties of their residues may highlight regions of structural similarity • Identity matrices • by stressing only identities in the alignment, stretches of sequence that may have diverged will not penalise any remaining common features

  25. Scoring Matrices (ctd) • As the direct source of residue by residue comparison scores the scoring matrix you choose will have a major impact on the alignment calculated • The most commonly used will be one of the mutation matrices • PAM, BLOSUM • The matrix that performs best will be the matrix that reflects the evolutionary separation of the sequences being aligned

More Related