1 / 74

Pairwise Sequence Alignment

Pairwise Sequence Alignment. Why align sequences?. Functional predictions based on identifying homologues. Assumes: conservation of sequence conservation of function BUT: Function carried out at level of proteins, i.e. 3-D structure

jereni
Download Presentation

Pairwise Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pairwise Sequence Alignment Presented by Liu Qi

  2. Why align sequences? • Functional predictions based on identifyinghomologues. • Assumes: conservation of sequence conservation of function • BUT: Function carried out at level of proteins, i.e. 3-D structure Sequence conservation carried out at level of DNA 1-D sequence Presented By Liu Qi

  3. Presented By Liu Qi

  4. Some Definitions • An alignment is a mutual arrangement of two sequences, which exhibits where the two sequences are similar, and where they differ. • An optimal alignment is one that exhibits the most correspondences and the least differences. It is the alignment with the highest score. May or may not be biologically meaningful. Presented By Liu Qi

  5. Methods • Dot matrix • Dynamic Programming • Word, k-tuple (heuristic based) Presented By Liu Qi

  6. Brief intro of methods dot matrix - all possible matches between sequence residues are found; used to compare two sequences to look for regions where they may align; very useful for finding indels and repeats in sequences; can be used as a first pass to see if there is any similarity between sequences • dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences; very computationally expensive - # of steps increases exponentially with sequence length • k-tuple (word) methods - used by FASTA and BLAST (previously described); much faster than dynamic programming and ideal for database searches; uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable Presented By Liu Qi

  7. Dot matrix 1 - one sequence listed along top of page and second sequence listed along the side 2 - move across row and put dot in any column where the character is the same 3 - continue for each row until all possible character matches between the sequences are represented by dots 4 - diagonal rows of dots reveal sequence similarity (can also find repeats and inverted repeats off the main diagonal) 5 - isolated dots represent random similarity unrelated to the alignment Presented By Liu Qi

  8. Presented By Liu Qi

  9. Dot matrix with noise reduction Presented By Liu Qi

  10. Dot matrix • To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences • We compare a number of positions (window size), and we write down a dot whenever there is minimum number (stringency) of identical characters Presented By Liu Qi

  11. Dot matrix • Caution is necessary regarding the window size and the stringency value. Generally, they assume different values for different problems. The optimal values will accent the regions of similarity of the two sequences • 􀁺 For DNA sequence usually, • 􀁺 Sliding window=15, stringency=10 • 􀁺 For Protein sequence • 􀁺 Sliding window=2 or 3, stringency=2 Presented By Liu Qi

  12. Things to be considered • Scoring matrix for distance correction. • Window size • Threshold Presented By Liu Qi

  13. The useful of Dot plot • Regions of similarity: diagonals • Insertions/deletions: gaps • Can determine intron/exon structure • Repeats: parallel diagonals • Inverted repeats: perpendicular diagonals • Inverted repeats • Can be used to determine regions of base pairing of RNA molecules Presented By Liu Qi

  14. Intra-sequence comparison Repeats Inverted repeats Low complexity Presented By Liu Qi

  15. Examples • ABRACADABRACAD Presented By Liu Qi

  16. palindrome Sequence: ATOYOTA Presented By Liu Qi

  17. Repeats Drosophila melanogaster SLIT protein against itself Presented By Liu Qi

  18. Low complexity Presented By Liu Qi

  19. Inter sequence comparison • Conserved domains • Insertion and deletion Presented By Liu Qi

  20. Insertion and deletion • Seq1:DOROTHYCROWFOOTHODGKIN • Seq2:DOROTHYHODGKIN Presented By Liu Qi

  21. Conserved domains Presented By Liu Qi

  22. Translated DNA and protein comparison :Exons and introns Presented By Liu Qi

  23. Presented By Liu Qi

  24. Even more can be done with RNA • RNA comparisons of the reverse, complement of a sequence to itself can often be very informative. • Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from Baker’s yeast. • The sequence and structure of this molecule is also known; the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural insights (even without complex folding algorithms). Presented By Liu Qi

  25. Structures of tRNA-Phe Presented By Liu Qi

  26. RNA comparisons of the reverse, complement of a sequence to itself Presented By Liu Qi

  27. Programs for Dot Matrix • Dotlet • http://www.isrec.isb-sib.ch/java/dotlet/Dotlet.html • SIGNAL • http://innovation.swmed.edu/research/informatics/res_inf_sig.html • Dotter http://www.cgb.ki.se/cgb/groups/sonnhammer/Dotter.html • COMPARE, DOTPLOT in GCG Presented By Liu Qi

  28. conclusion • Advantages: Readily reveals the presence of insertions/deletions and direct and inverted repeats that are more difficult to find by the other, more automated methods. let’s your eyes/brain do the work –VERY EFFICIENT!!!! • Disadvantages:Most dot matrix computer programs do not show an actual alignment. Does not return a score to indicate how ‘optimal’ a given alignment is. Presented By Liu Qi

  29. Reference • Gibbs, A. J. & McIntyre, G. A. (1970). The diagram method for comparing sequences. its The diagram method for comparing sequences. its use with amino acid and nucleotide sequences.Eur. J. Biochem. 16 , 1-11. • Maizel, J.V., Jr. and Lenk R.P. (1981). nhanced graphic matrix analysis of nucleic acid and protein sequences. Proc. Natl. Acad. Sci. 78: 7665- 7669 • Staden, R. (1982). An interactive graphics program for comparing and aligning nucleic-acid and amino-acid acid sequences. Nucl. Acid. Res. 10 (9), 2951-2961. Presented By Liu Qi

  30. Dynamic Programming • Answer: what is the optimal alignment of two sequences(the best score)? • How many different alignments? Presented By Liu Qi

  31. Alignment methods with DP • Global alignment - Needleman-Wunsch (1970) maximizes the number of matches between the sequences along the entire length of the sequences. • Local alignment - Smith-Waterman (1981) is a modification of the dynamic programming algorithm giving the highest scoring local match between two sequences Presented By Liu Qi

  32. 5 3 B D 4 5 F A 6 C 4 2 E 3 Dynamic Programming • A simple example 8 9 7 Presented By Liu Qi

  33. Exercise Presented By Liu Qi

  34. 动态规划的适用条件 • 一个最优化策略的子策略总是最优的。 • 无后向性 • 以前各阶段的状态无法直接影响它未来的决策 • 空间换时间(子问题的重叠性) Presented By Liu Qi

  35. Dynamic Programming Presented By Liu Qi

  36. Dynamic Programming Presented By Liu Qi

  37. Dynamic Programming Presented By Liu Qi

  38. Dynamic Programming Presented By Liu Qi

  39. DP Algorithm for Global Alignment • Two sequences X= x1...xn and Y= y1...ym • F(i, j) be the optimal alignment score of X1...iand Y1...j(0 ≤ i ≤ n, 0 ≤ j ≤ m). Presented By Liu Qi

  40. DP in equation form Presented By Liu Qi

  41. A simple example Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. Presented By Liu Qi

  42. A simple example Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. Presented By Liu Qi

  43. A simple example Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. Presented By Liu Qi

  44. A simple example Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. Presented By Liu Qi

  45. Traceback • Start from the lower right corner and trace back to the upper left. • Each arrow introduces one character at the end of each aligned sequence. • A horizontal move puts a gap in the left sequence. • A vertical move puts a gap in the top sequence. • A diagonal move uses one character from each sequence. Presented By Liu Qi

  46. A simple example Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. • Start from the lower right corner and trace back to the upper left. • Each arrow introduces one character at the end of each aligned sequence. • A horizontal move puts a gap in the left sequence. • A vertical move puts a gap in the top sequence. • A diagonal move uses one character from each sequence. Presented By Liu Qi

  47. A simple example Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. • Start from the lower right corner and trace back to the upper left. • Each arrow introduces one character at the end of each aligned sequence. • A horizontal move puts a gap in the left sequence. • A vertical move puts a gap in the top sequence. • A diagonal move uses one character from each sequence. AAG- AAG- -AGC A-GC Presented By Liu Qi

  48. Exercise • Find Global alignment • X=catgt • Y=acgctg • Score: d=-1 mismatch=-1 match=2 Presented By Liu Qi

  49. Answer Presented By Liu Qi

  50. Local alignment • A single-domain protein may be homologous to a region within a multi-domain protein. • Usually, an alignment that spans the complete length of both sequences is not required. Presented By Liu Qi

More Related