1 / 11

Multiple sequence alignment (msa)

Multiple sequence alignment (msa). Motivation. “Two swallows do not make a summer” Discover conserved regions Predict important regions of the protein Discover domains Search for additional members of a protein family (profile-based searching) Build phylogenetic trees. Topics.

annice
Download Presentation

Multiple sequence alignment (msa)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple sequence alignment (msa) Lecture 8 CS566

  2. Motivation • “Two swallows do not make a summer” • Discover conserved regions • Predict important regions of the protein • Discover domains • Search for additional members of a protein family (profile-based searching) • Build phylogenetic trees Lecture 8 CS566

  3. Topics • Scoring schemes • Pairwise • N-way • Optimal • Multidimensional dynamic programming • Heuristic algorithms • Progressive • Iterative Lecture 8 CS566

  4. Scoring schemes • Alignment score = lCl • Column Score Cl • Ideally • Based on n-way joint probability (n-generalized AAS) • Sum of Pairs • i<j sij Based on amino acid substitution matrices • Gap-gap = 0; Gap-char = -g • Commonest scheme used • Fallacious: • Assumes only 2-way and not n-way joint probabilities • Score not proportional to number of sequences in alignment • N-way sums • Need to know central point of reference (ancestral sequence) Lecture 8 CS566

  5. Multidimensional Dynamic Programming • Line up n sequences in a grid having n dimensions • Score each cell as the maximum of • Lining up all corresponding characters AND • All possible combinations of gaps and characters • Note choice made • Reconstruct alignment by traceback • Global or Local dynamic programming? • Space complexity? • Time complexity? Lecture 8 CS566

  6. MSA – Efficient Multidimensional Dynamic Programming • Carillo-Lipman MSA algorithm • Uses pair-wise dynamic programming to identify sub-matrix regions of near-optimality • n-dimensional dynamic programming carried out within space of intersection of near-optimal regions • Still limited to only a few sequences • Is this an optimal algorithm or not? Lecture 8 CS566

  7. Progressive alignment • New concepts • Consider aligning alignments to alignments/sequences en bloc • Hierarchical/Sequentialorder of alignment (“Once a cobbler, always a cobbler”) • Heuristic • Fast Lecture 8 CS566

  8. Progressive alignment - Clustal • Compute all pairwise alignments • Convert alignment scores into distances • Build guide tree (phylogenetic tree) • Align sequences in order suggested by ‘guide tree’ • Position specific scoring system used • Gap costs depend on position • Composition based scoring system used • Percentage similarity dictates choice of scoring matrix • Weighting based on composition bias • Only ‘cross-terms’ (profile-profile) used in scoring Lecture 8 CS566

  9. Progressive alignment - Clustal • ClustalV (Now history!) • ClustalW (Takes weighting into account for composition bias) • ClustalX (Graphical interface) Lecture 8 CS566

  10. Iterative refinement-1 • “Once a cobbler, now a king!” • Iterative algorithm: • Compute all pairwise similarities • Start with best pair • Add ‘most-similar’ sequence to profile successively till none left • Remove and re-align each sequence till convergence Lecture 8 CS566

  11. Iterative refinement-2 • Genetic programming-based msa • Create initial random alignment • Score alignment • Retain better scoring half of alignment • Mutate remaining half of alignment with ideas from genetic recombination • Random gap insertion • En bloc shifts • Probabilistic order of alignment • Score resulting alignment • Iterate till convergence Lecture 8 CS566

More Related