1 / 38

Developing Pairwise Sequence Alignment Algorithms

Developing Pairwise Sequence Alignment Algorithms. 2. Outline. Overview of global and local alignmentReferences for sequence alignment algorithmsDiscussion of Needleman-Wunsch iterative approach to global alignmentDiscussion of Smith-Waterman recursive approach to local alignmentDiscussion of ho

lathrop
Download Presentation

Developing Pairwise Sequence Alignment Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez

    2. Developing Pairwise Sequence Alignment Algorithms 2 Outline Overview of global and local alignment References for sequence alignment algorithms Discussion of Needleman-Wunsch iterative approach to global alignment Discussion of Smith-Waterman recursive approach to local alignment Discussion of how LCS Algorithm can be extended for Global alignment (Needleman-Wunsch) Local alignment (Smith-Waterman) Group assignments for project

    3. Developing Pairwise Sequence Alignment Algorithms 3 Overview of Pairwise Sequence Alignment Dynamic Programming Applied to optimization problems Useful when Problem can be recursively divided into sub-problems Sub-problems are not independent Needleman-Wunsch is a global alignment technique that uses an iterative algorithm and no gap penalty (could extend to fixed gap penalty). Smith-Waterman is a local alignment technique that uses a recursive algorithm and can use alternative gap penalties (such as affine). Smith-Waterman’s algorithm is an extension of Longest Common Substring (LCS) problem and can be generalized to solve both local and global alignment. Note: Needleman-Wunsch is usually used to refer to global alignment regardless of the algorithm used.

    4. Developing Pairwise Sequence Alignment Algorithms 4 References http://www.sbc.su.se/~arne/kurser/swell/pairwise_alignments.html An Introduction to Bioinformatics Algorithms (Computational Molecular Biology) Neil C. Jones, Pavel Pevzner Computational Molecular Biology – An Algorithmic Approach, Pavel Pevzner Introduction to Computational Biology – Maps, sequences, and genomes, Michael Waterman Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology, Dan Gusfield

    5. Developing Pairwise Sequence Alignment Algorithms 5 Classic Papers Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970. (http://www.cs.umd.edu/class/spring2003/cmsc838t/papers/needlemanandwunsch1970.pdf) Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981.(http://www.cmb.usc.edu/papers/msw_papers/msw-042.pdf)

    6. Developing Pairwise Sequence Alignment Algorithms 6 Why search sequence databases? I have just sequenced something. What is known about the thing I sequenced? I have a unique sequence. Is there similarity to another gene that has a known function? I found a new protein sequence in a lower organism. Is it similar to a protein from another species?

    7. Developing Pairwise Sequence Alignment Algorithms 7 Global Alignment Method

    8. Developing Pairwise Sequence Alignment Algorithms 8 Global Alignment Method (cont. 1)

    9. Developing Pairwise Sequence Alignment Algorithms 9 Global Alignment Method (cont. 2)

    10. Developing Pairwise Sequence Alignment Algorithms 10 Global Alignment Method (cont. 3)

    11. Developing Pairwise Sequence Alignment Algorithms 11 Global Alignment Method (cont. 4)

    12. Developing Pairwise Sequence Alignment Algorithms 12 Global Alignment Method (cont. 5)

    13. Developing Pairwise Sequence Alignment Algorithms 13 Three steps in Dynamic Programming

    14. Developing Pairwise Sequence Alignment Algorithms 14

    15. Developing Pairwise Sequence Alignment Algorithms 15

    16. Developing Pairwise Sequence Alignment Algorithms 16

    17. Developing Pairwise Sequence Alignment Algorithms 17

    18. Developing Pairwise Sequence Alignment Algorithms 18

    19. Developing Pairwise Sequence Alignment Algorithms 19 Global Alignment output file

    20. Developing Pairwise Sequence Alignment Algorithms 20 LCS Problem (review) Similarity score si-1,j si,j = max { si,j-1 si-1,j-1 + 1, if vi = wj

    21. Developing Pairwise Sequence Alignment Algorithms 21 Extend LCS to Global Alignment si-1,j + ?(vi, -) si,j = max { si,j-1 + ?(-, wj) si-1,j-1 + ?(vi, wj) ?(vi, -) = ?(-, wj) = -? = fixed gap penalty ?(vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM

    22. Developing Pairwise Sequence Alignment Algorithms 22 Global Alignment Alternatives Ends-free alignment – don’t penalize gaps at the beginning or end Initialize first row and column of S to 0 Search last row and column for maximum score Regular global alignment – score end to end (penalize gaps at beginning and end) Initialize first row and column of S with gap penalty Alignment score is in the lower right corner of S

    23. Developing Pairwise Sequence Alignment Algorithms 23 Historical Perspective: Needleman-Wunsch (1 of 3)

    24. Developing Pairwise Sequence Alignment Algorithms 24 Historical Perspective: Needleman-Wunsch (2 of 3)

    25. Developing Pairwise Sequence Alignment Algorithms 25 Historical Perspective: Needleman-Wunsch (3 of 3)

    26. Developing Pairwise Sequence Alignment Algorithms 26 Smith-Waterman Algorithm Advances in Applied Mathematics, 2:482-489 (1981)

    27. Developing Pairwise Sequence Alignment Algorithms 27 Smith-Waterman (cont. 1)

    28. Developing Pairwise Sequence Alignment Algorithms 28

    29. Developing Pairwise Sequence Alignment Algorithms 29

    30. Developing Pairwise Sequence Alignment Algorithms 30 Calculation of similarity score and percent similarity

    31. Developing Pairwise Sequence Alignment Algorithms 31 Extend LCS to Local Alignment 0 (no negative scores) si-1,j + ?(vi, -) si,j = max { si,j-1 + ?(-, wj) si-1,j-1 + ?(vi, wj) ?(vi, -) = ?(-, wj) = -? = fixed gap penalty ?(vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM

    32. Developing Pairwise Sequence Alignment Algorithms 32 Historical Perspective: Smith-Waterman (1 of 3)

    33. Developing Pairwise Sequence Alignment Algorithms 33 Historical Perspective: Smith-Waterman (2 of 3)

    34. Developing Pairwise Sequence Alignment Algorithms 34 Historical Perspective: Smith-Waterman (3 of 3)

    35. Developing Pairwise Sequence Alignment Algorithms 35 The E value (false positive expectation value) The Expect value (E) is a parameter that describes the number of “hits” one can "expect" to see just by chance when searching a database of a particular size. It decreases exponentially as the Similarity Score (S) increases (inverse relationship). The higher the Similarity Score, the lower the E value. Essentially, the E value describes the random background noise that exists for matches between two sequences. The E value is used as a convenient way to create a significance threshold for reporting results. When the E value is increased from the default value of 10 prior to a sequence search, a larger list with more low-similarity scoring hits can be reported. An E value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size you might expect to see 1 match with a similar score simply by chance.

    36. Developing Pairwise Sequence Alignment Algorithms 36 E value (Karlin-Altschul statistics) E = K•m•n•e-?S Where K is constant, m is the length of the query sequence, n is the length of the database sequence, ? is the decay constant, S is the similarity score. If S increases, E decreases exponentially. If the decay constant increases, E decreases exponentially If m•n increases the “search space” increases and there is a greater chance for a random “hit”, E increases. Larger database will increase E. However, larger query sequence often decreases E. Why???

    37. Developing Pairwise Sequence Alignment Algorithms 37 Project Teams and Presentation Assignments Base Project (Global Alignment): Larry and Darnell Extension 1 (Ends-Free Global Alignment): Steven and Charlie Extension 2 (Local Alignment): Olivera and Natalia Extension 3 (Local Alignment – all): Brittany and Alana Extension 4 (Database): Nathaniel and Anna U. Extension 5 (Space Efficient Algorithm): David and Shilpa Extension 6 (Affine Gap Penalty): Rachel and Anna P. Extension 7 (Hirschberg’s Algorithm): Wendy and Andrew

    38. Developing Pairwise Sequence Alignment Algorithms 38 Workshop Meet with your group and develop for the overall structure of your program High-level algorithm Identify the modules, functions (including parameters), and global variables Determine who is responsible for each module Devise a development timeline and a testing strategy

More Related