1 / 41

Multiple sequence alignment methods

Multiple sequence alignment methods. Corné Hoogendoorn Denis Miretskiy. Overview. What a multiple alignment means Scoring a multiple alignment Break Multidimensional dynamic programming Progressive alignment methods. What a multiple alignment means.

Download Presentation

Multiple sequence alignment methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple sequence alignment methods Corné Hoogendoorn Denis Miretskiy Multiple sequence alignment methods

  2. Overview • What a multiple alignment means • Scoring a multiple alignment • Break • Multidimensional dynamic programming • Progressive alignment methods Multiple sequence alignment methods

  3. What a multiple alignment means • Homologous residues are aligned in columns • Structurally homologous • Evolutionarily homologous • Similar 3D structural positions • Diverging from a common ancestral residue Multiple sequence alignment methods

  4. Multiple alignment - issues • Identifying unambiguously homologous positions is not possible • A need to identify which alignment is best • Protein structures and sequences evolve • Sequences not entirely superposable Multiple sequence alignment methods

  5. Multiple alignment - issues • There always is an unambiguously correct evolutionary alignment • Common ancestral sequence • Sheerly impossible to infer the evolutionary history • Usually easier to construct a structural alignment Multiple sequence alignment methods

  6. Multiple alignment - issues • Sequence diverges even faster than structure • Structurally unalignable protein parts cannot be aligned by sequence either • Some parts are very well alignable • Use these parts to align whatever can be aligned • Disregard the rest to assess alignment quality • Supposedly meaningless biases are omitted Multiple sequence alignment methods

  7. Scoring an alignment • Some positions are more conserved than others • Position-specific scoring • Sequences are not independent • Related to each other by a phylogenetic tree • Specify a complete probabilistic model of molecular sequence evolution Multiple sequence alignment methods

  8. Complete probabilistic model • Probabilities of all evolutionary events • Prior probability of root ancestral sequence • Probabilities of evolutionary change depend on evolutionary time • Position-specific structural and functional constraints • We just don’t have all the necessary data Multiple sequence alignment methods

  9. Workable approximations • Assume that all columns are statistically independent Score for multiple alignment m Gap score/penalty Score for column i in the multiple alignment m Multiple sequence alignment methods

  10. Scoring an alignment • Notations Multiple sequence alignment methods

  11. Minimum Entropy:Further simplification • We already assumed independence between columns • Complex statistical dependence between sequences (within columns) if their phylogenetic tree has many intermediate ancestors • We assume independence between and within columns Multiple sequence alignment methods

  12. Minimum entropy • Probability of column mi • Score of column mi can be defined as the negative logarithm A regularized probability estimate as used in chapter 5 An entropy measure directly related to the Shannon entropy (chapter 11) Multiple sequence alignment methods

  13. Example (1) Multiple sequence alignment methods

  14. Example (2) Multiple sequence alignment methods

  15. Example (3) Will this ever be 0 in reality? Why (not)? Multiple sequence alignment methods

  16. Example (4) Multiple sequence alignment methods

  17. Minimum entropy • Very near to the HMM formulation • Choose the sequences carefully • Usually the sample of sequences is biased • Weighting schemes as discussed in chapter 5 are necessary • This partially compensates for the defects of the assumption of sequence independence Multiple sequence alignment methods

  18. Sum of pairs • Also assumes statistical independence between columns • Uses substitution matrices • For simple linear gap costs, s(a,-) s(-,a) and s(-,-) are defined, with s(-,-) = 0 Scores s(a,b) come from substitution matrices like PAM or BLOSUM Multiple sequence alignment methods

  19. Sum of pairs • Substitution scores are usually log-odds scores for pairwise comparisons • log(pab/qaqb) + log(pbc/qbqc) + log(pac/qaqc) • log(pabc/qaqbqc) • Each sequence is scored as if it descended from the N-1 other sequences • Evolutionary events are over-counted Multiple sequence alignment methods

  20. Problem with SP scores • Consider an alignment of N sequences • All have leucine (L) at position i Number of symbol pairs in the column Score for an L-L alignment according to the BLOSUM50 matrix Multiple sequence alignment methods

  21. Problem with SP scores • What if one sequence has glycine (G) at i? • G-L pair scores -4, difference with L-L is 9 • The score is worse than the all-leucine column by a fraction Multiple sequence alignment methods

  22. What a multiple alignment meansScoring a multiple alignment Questions? Break Multiple sequence alignment methods

  23. Multidimensional dynamic programming • We assume that columns of an alignment are statistically independent • Gaps are scored with a linear gap cost • Now we can calculate overall score S(m) Where S(mi) is a score for column i Multiple sequence alignment methods

  24. Define as the maximum score of an alignment up to the subsequences ending with Calculating the overall score Multiple sequence alignment methods

  25. Multiple sequence alignment methods

  26. Simple notation • Introduce Di which is 0 or 1 and define the “product” • Now recursion can be written as follows Multiple sequence alignment methods

  27. Complexity of algorithm • The algorithm requires the computation of the whole dynamic programming matrix with L1, L2,…,LN entries. • We have to view 2N - 1 combinations of gaps in a column. • All sequences have roughly the same length • Memory complexity of algorithm is • Time complexity is Multiple sequence alignment methods

  28. MSA • Let akl denote the pairwise alignment between sequences k and l • the score of the complete alignment is given • Let âkl be the optimal pairwise alignment of k, l • Obviously Multiple sequence alignment methods

  29. Lower bound • Assume that we have a lower bound of the optimal multiple alignment, so • In other words • Where Multiple sequence alignment methods

  30. Lower bound • Now we can look only at pairwise alignments of k and l that score better bkl • We need to obtain s(a), and this can be done by using a progressive alignment algorithm Multiple sequence alignment methods

  31. Restricted algorithm • For each pair k, l we can find the complete set Bkl of coordinate pairs (ik, il) such that the best alignment of xk to xl through (ik, il) scores more than bkl • Now we only have to look at cells (i1, i2,…, iN) which meet the following condition: • (ik, il) is in Bkl for all k, l Multiple sequence alignment methods

  32. Multiple sequence alignment methods

  33. Progressive alignment methods • The algorithms differ in several ways • Choice of order to do the alignment • Whether the progression involves only alignment of sequences to a single growing alignment or whether subfamilies are built upon a tree structure Multiple sequence alignment methods

  34. Feng-Doolittle progressive multiple alignment • Calculate a diagonal matrix of N(N-1)/2 distances between all pairs of N sequences by standard pairwise alignment • Construct a guide tree from the distance matrix using the Fitch&Margoliash clustering algorithm • Starting from the first node added to the tree, align the child nodes Repeat until all sequences have been aligned. Multiple sequence alignment methods

  35. Converting scores to distances Where Smax is the maximum score Sobs is the observed pairwise alignment score Srand is the expected score for aligning two random sequences Multiple sequence alignment methods

  36. Profile alignment • Linear gap scores can be included in the SP score: • Global alignment score: Multiple sequence alignment methods

  37. CLUSTALW progressive alignment • Construct a distance matrix of all N(N-1)/2 pair by pairwise dynamic programming alignment. • Construct a guide tree by a neighbor-joining clustering algorithm (Saitou & Nei). • Progressively align at nodes in order of decreasing similarity, using sequence-sequence, sequence-profile and profile-profile alignment. Multiple sequence alignment methods

  38. CLUSTALW properties • Sequences are weighted to compensate for biased representation. • The substitution matrix used to score an alignment is chosen based on the expected similarity of the sequences • Position-specific gap-open profile penalties are multiplied by a modifier that is a function of the residues observed at the position. Multiple sequence alignment methods

  39. CLUSTALW properties • Gap-open penalties are also decreased if the position is spanned by a consecutive stretch of five or more hydrophilic residues. • Both gap-open and gap-extend penalties are increased if there are also no gaps occur nearby in the alignment. • In the progressive alignment stage, if the score of an alignment is low, we have to accumulate profile information Multiple sequence alignment methods

  40. Iterative refinement methods:Barton-Stenberg multiple alignment • Find two sequences with the highest pairwise similarity and align them using standard pairwise dynamic programming alignment. • Find the sequence that is most similar to a profile of the alignment of the first two and align it to the first two by profile-sequence alignment. Repeat until all sequences have been included in the multiply alignment. Multiple sequence alignment methods

  41. Iterative refinement methods:Barton-Stenberg multiple alignment • Remove sequence and realign it to a profile of the other aligned sequences by profile-sequence alignment. Repeat for sequences. • Repeat the previous realignment step a fixed number of times or until the alignment score converges. Multiple sequence alignment methods

More Related