1 / 31

Pairwise Sequence Alignment

Pairwise Sequence Alignment. CISC 4020 Bioinformatics Spring 2013 Department of Computer and Information Science. Outline. Overview and examples Definitions: homologs, paralogs, orthologs Assigning scores to aligned amino acids: Dayhoff’s PAM matrices

mirari
Download Presentation

Pairwise Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pairwise Sequence Alignment CISC 4020 Bioinformatics Spring 2013 Department of Computer and Information Science

  2. Outline • Overview and examples • Definitions: homologs, paralogs, orthologs • Assigning scores to aligned amino acids: Dayhoff’s PAM matrices • Alignment algorithms: Needleman-Wunsch, Smith-Waterman CISC 4020 Bioinformatics

  3. Two kinds of sequence alignment: global and local • The global alignment algorithm of Needleman and Wunsch (1970). • The local alignment algorithm of Smith and Waterman (1981). CISC 4020 Bioinformatics

  4. Optimal alignments are guaranteed to be found. To achieve the best score by adding gaps when needed allowing for conservative substitutions choosing a scoring system (simple or complicated) Global alignment with the algorithmof Needleman and Wunsch (1970) CISC 4020 Bioinformatics

  5. Three steps of N-W • Setting up a matrix • Two sequences can be compared in a matrix along x- and y-axes. • Scoring the matrix • Identifying the optimal alignment • If they are identical, a path along a diagonal can be drawn • Find the optimal subpaths CISC 4020 Bioinformatics

  6. Four possible outcomes in aligning two sequences 1 2 [1] identity (stay along a diagonal) [2] mismatch (stay along a diagonal) [3] gap in one sequence (move vertically) [4] gap in the other sequence (move horizontally) CISC 4020 Bioinformatics

  7. CISC 4020 Bioinformatics

  8. Start Needleman-Wunsch with an identity matrix CISC 4020 Bioinformatics

  9. Start Needleman-Wunsch with an identity matrix (Blosum62) CISC 4020 Bioinformatics

  10. Set up a scoring matrix, distinct from the identity matrix, then fill in the matrix using “dynamic programming”. CISC 4020 Bioinformatics

  11. Fill in the matrix using “dynamic programming” CISC 4020 Bioinformatics

  12. Fill in the matrix using “dynamic programming” CISC 4020 Bioinformatics

  13. Fill in the matrix using “dynamic programming” CISC 4020 Bioinformatics

  14. Fill in the matrix using “dynamic programming” CISC 4020 Bioinformatics

  15. Fill in the matrix using “dynamic programming” CISC 4020 Bioinformatics

  16. Fill in the matrix using “dynamic programming” CISC 4020 Bioinformatics

  17. Traceback to find the optimal (best) pairwise alignment CISC 4020 Bioinformatics

  18. Needleman-Wunsch: dynamic programming • N-W is guaranteed to find optimal alignments, although the algorithm does not search all possible alignments. • It is an example of a dynamic programming algorithm: an optimal path (alignment) is identified by incrementally extending optimal sub-paths. Thus, a series of decisions is made at each step of the alignment to find the pair of residues with the best score. CISC 4020 Bioinformatics

  19. European Bioinformatics Institute (EBI) Try using needle to implement a Needleman-Wunsch global alignment algorithm to find the optimum alignment (including gaps): http://www.ebi.ac.uk/Tools/psa/emboss_needle/ CISC 4020 Bioinformatics Page 81

  20. Queries: beta globin (NP_000509) alpha globin (NP_000549)

  21. Global alignment vs. local alignment • Global alignment (Needleman-Wunsch) extends from one end of each sequence to the other. • Local alignment finds optimally matching regions within two sequences (“subsequences”). • Local alignment is almost always used for database searches such as BLAST. It is useful to find domains (or limited regions of homology) within sequences. • Smith and Waterman (1981) solved the problem of performing optimal local sequence alignment. Other methods (BLAST, FASTA) are faster but less thorough. CISC 4020 Bioinformatics

  22. Global alignment (top) includes matches ignored by local alignment (bottom) 15% identity 30% identity CISC 4020 Bioinformatics NP_824492, NP_337032

  23. How the Smith-Waterman algorithm works • Set up a matrix between two proteins (size m+1, n+1) • No values in the scoring matrix can be negative! S > 0 • The score in each cell is the maximum of four values: • s(i-1, j-1) + the new score at [i,j] (a match or mismatch) • s(i,j-1) – gap penalty • s(i-1,j) – gap penalty • zero CISC 4020 Bioinformatics

  24. Find the path • The highest value in the matrix represents the end of the alignment. • It is not necessarily at the lower right corner. • The trace-back procedure begins with this highest value position and proceeds diagonally up to the left stopping when a cell is reached with a value of zero. This defines the start of the alignment. • The start is not necessarily at the extreme top left of the matrix.

  25. Smith-Waterman algorithm allows the alignment of subsets of sequences S1: GCC- UCG S2: GCCAUUG CISC 4020 Bioinformatics

  26. UVA FASTA Server Try using SSEARCH to perform a rigorous Smith-Waterman local alignment: http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml CISC 4020 Bioinformatics

  27. Queries: beta globin (NP_000509) alpha globin (NP_000549)

  28. Rapid, heuristic versions of Smith-Waterman: FASTA and BLAST • Smith-Waterman is very rigorous and it is guaranteed to find an optimal alignment. • But Smith-Waterman is slow. It requires computer space and time proportional to the product of the two sequences being aligned (or the product of a query against an entire database). • Gotoh (1982) and Myers and Miller (1988) improved the algorithms so both global and local alignment require less time and space. • FASTA and BLAST provide rapid alternatives to S-W. CISC 4020 Bioinformatics

More Related