380 likes | 779 Views
Developing Pairwise Sequence Alignment Algorithms. 2. Outline. Overview of global and local alignmentReferences for sequence alignment algorithmsDiscussion of Needleman-Wunsch iterative approach to global alignmentDiscussion of Smith-Waterman recursive approach to local alignmentDiscussion of ho
E N D
1. Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez
2. Developing Pairwise Sequence Alignment Algorithms 2 Outline Overview of global and local alignment
References for sequence alignment algorithms
Discussion of Needleman-Wunsch iterative approach to global alignment
Discussion of Smith-Waterman recursive approach to local alignment
Discussion of how LCS Algorithm can be extended for
Global alignment (Needleman-Wunsch)
Local alignment (Smith-Waterman)
Group assignments for project
3. Developing Pairwise Sequence Alignment Algorithms 3 Overview of Pairwise Sequence Alignment Dynamic Programming
Applied to optimization problems
Useful when
Problem can be recursively divided into sub-problems
Sub-problems are not independent
Needleman-Wunsch is a global alignment technique that uses an iterative algorithm and no gap penalty (could extend to fixed gap penalty).
Smith-Waterman is a local alignment technique that uses a recursive algorithm and can use alternative gap penalties (such as affine). Smith-Waterman’s algorithm is an extension of Longest Common Substring (LCS) problem and can be generalized to solve both local and global alignment.
Note: Needleman-Wunsch is usually used to refer to global alignment regardless of the algorithm used.
4. Developing Pairwise Sequence Alignment Algorithms 4 References http://www.sbc.su.se/~arne/kurser/swell/pairwise_alignments.html
An Introduction to Bioinformatics Algorithms (Computational Molecular Biology) Neil C. Jones, Pavel Pevzner
Computational Molecular Biology – An Algorithmic Approach, Pavel Pevzner
Introduction to Computational Biology – Maps, sequences, and genomes, Michael Waterman
Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology, Dan Gusfield
5. Developing Pairwise Sequence Alignment Algorithms 5 Classic Papers Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970. (http://www.cs.umd.edu/class/spring2003/cmsc838t/papers/needlemanandwunsch1970.pdf)
Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981.(http://www.cmb.usc.edu/papers/msw_papers/msw-042.pdf)
6. Developing Pairwise Sequence Alignment Algorithms 6 Why search sequence databases? I have just sequenced something. What is known about the thing I sequenced?
I have a unique sequence. Is there similarity to another gene that has a known function?
I found a new protein sequence in a lower organism. Is it similar to a protein from another species?
7. Developing Pairwise Sequence Alignment Algorithms 7 Global Alignment Method
8. Developing Pairwise Sequence Alignment Algorithms 8 Global Alignment Method (cont. 1)
9. Developing Pairwise Sequence Alignment Algorithms 9 Global Alignment Method (cont. 2)
10. Developing Pairwise Sequence Alignment Algorithms 10 Global Alignment Method (cont. 3)
11. Developing Pairwise Sequence Alignment Algorithms 11 Global Alignment Method (cont. 4)
12. Developing Pairwise Sequence Alignment Algorithms 12 Global Alignment Method (cont. 5)
13. Developing Pairwise Sequence Alignment Algorithms 13 Three steps in Dynamic Programming
14. Developing Pairwise Sequence Alignment Algorithms 14
15. Developing Pairwise Sequence Alignment Algorithms 15
16. Developing Pairwise Sequence Alignment Algorithms 16
17. Developing Pairwise Sequence Alignment Algorithms 17
18. Developing Pairwise Sequence Alignment Algorithms 18
19. Developing Pairwise Sequence Alignment Algorithms 19 Global Alignment output file
20. Developing Pairwise Sequence Alignment Algorithms 20 LCS Problem (review) Similarity score
si-1,j
si,j = max { si,j-1
si-1,j-1 + 1, if vi = wj
21. Developing Pairwise Sequence Alignment Algorithms 21 Extend LCS to Global Alignment si-1,j + ?(vi, -)
si,j = max { si,j-1 + ?(-, wj)
si-1,j-1 + ?(vi, wj)
?(vi, -) = ?(-, wj) = -? = fixed gap penalty
?(vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM
22. Developing Pairwise Sequence Alignment Algorithms 22 Global Alignment Alternatives Ends-free alignment – don’t penalize gaps at the beginning or end
Initialize first row and column of S to 0
Search last row and column for maximum score
Regular global alignment – score end to end (penalize gaps at beginning and end)
Initialize first row and column of S with gap penalty
Alignment score is in the lower right corner of S
23. Developing Pairwise Sequence Alignment Algorithms 23 Historical Perspective:Needleman-Wunsch (1 of 3)
24. Developing Pairwise Sequence Alignment Algorithms 24 Historical Perspective: Needleman-Wunsch (2 of 3)
25. Developing Pairwise Sequence Alignment Algorithms 25 Historical Perspective: Needleman-Wunsch (3 of 3)
26. Developing Pairwise Sequence Alignment Algorithms 26 Smith-Waterman Algorithm Advances inApplied Mathematics, 2:482-489 (1981)
27. Developing Pairwise Sequence Alignment Algorithms 27 Smith-Waterman (cont. 1)
28. Developing Pairwise Sequence Alignment Algorithms 28
29. Developing Pairwise Sequence Alignment Algorithms 29
30. Developing Pairwise Sequence Alignment Algorithms 30 Calculation of similarity score and percent similarity
31. Developing Pairwise Sequence Alignment Algorithms 31 Extend LCS to Local Alignment 0 (no negative scores)
si-1,j + ?(vi, -)
si,j = max { si,j-1 + ?(-, wj)
si-1,j-1 + ?(vi, wj)
?(vi, -) = ?(-, wj) = -? = fixed gap penalty
?(vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM
32. Developing Pairwise Sequence Alignment Algorithms 32 Historical Perspective: Smith-Waterman (1 of 3)
33. Developing Pairwise Sequence Alignment Algorithms 33 Historical Perspective: Smith-Waterman (2 of 3)
34. Developing Pairwise Sequence Alignment Algorithms 34 Historical Perspective: Smith-Waterman (3 of 3)
35. Developing Pairwise Sequence Alignment Algorithms 35 The E value (false positive expectation value) The Expect value (E) is a parameter that describes the number of “hits” one can "expect" to see just by chance when searching a database of a particular size. It decreases exponentially as the Similarity Score (S) increases (inverse relationship). The higher the Similarity Score, the lower the E value. Essentially, the E value describes the random background noise that exists for matches between two sequences. The E value is used as a convenient way to create a significance threshold for reporting results. When the E value is increased from the default value of 10 prior to a sequence search, a larger list with more low-similarity scoring hits can be reported. An E value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size you might expect to see 1 match with a similar score simply by chance.
36. Developing Pairwise Sequence Alignment Algorithms 36 E value (Karlin-Altschul statistics) E = K•m•n•e-?S
Where K is constant, m is the length of the query sequence, n is the length of the database sequence, ? is the decay constant, S is the similarity score.
If S increases, E decreases exponentially.
If the decay constant increases, E decreases exponentially
If m•n increases the “search space” increases and there is a greater chance for a random “hit”, E increases. Larger database will increase E. However, larger query sequence often decreases E. Why???
37. Developing Pairwise Sequence Alignment Algorithms 37 Project Teams and Presentation Assignments Base Project (Global Alignment):
Larry and Darnell
Extension 1 (Ends-Free Global Alignment):
Steven and Charlie
Extension 2 (Local Alignment):
Olivera and Natalia
Extension 3 (Local Alignment – all):
Brittany and Alana
Extension 4 (Database):
Nathaniel and Anna U.
Extension 5 (Space Efficient Algorithm):
David and Shilpa
Extension 6 (Affine Gap Penalty):
Rachel and Anna P.
Extension 7 (Hirschberg’s Algorithm):
Wendy and Andrew
38. Developing Pairwise Sequence Alignment Algorithms 38 Workshop Meet with your group and develop for the overall structure of your program
High-level algorithm
Identify the modules, functions (including parameters), and global variables
Determine who is responsible for each module
Devise a development timeline and a testing strategy