1 / 21

Multiple Sequence Alignments

Multiple Sequence Alignments. It is God’s privilege to conceal things, but the kings’ pride is to research them. (Proverbs 25:2; ascribed to King Solomon of Israel, BC 1000). 1-4, Jan, 2006 Protein Folding Winter School Keehyoung Joo School of Computational Sciences, KIAS , Seoul, Korea.

alder
Download Presentation

Multiple Sequence Alignments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Sequence Alignments It is God’s privilege to conceal things, but the kings’ pride is to research them. (Proverbs 25:2; ascribed to King Solomon of Israel, BC 1000) 1-4, Jan, 2006 Protein Folding Winter School Keehyoung Joo School of Computational Sciences, KIAS, Seoul, Korea

  2. The major goal of computational sequence analysis is to predict the structure and function of genes and proteins from their sequence.

  3. Contents • How to make your model from sequence ? • What is a Multiple Sequence Alignment(MSA)? • How can I use a MSA (Motivation) ? • What is the matter of MSA ? • The choice of the sequences • The choice of an objective function • The optimization of that function • How to make MSA ?

  4. T T C C P A V R S I S N F How to make your model from sequence ? • Tertiary structure prediction methods • Homology modeling • Fold Recognition • Ab. Initio method Fold DB Protein Data Bank Find template folds and alignment Unknown Sequence Modeling from templates and alignment

  5. What is a Multiple Sequence Alignment MSA can be seen as a generalization of Pairwise Sequence Alignment.

  6. How can I use a MSA (Motivation) • Clustering, classification, or categorization of genes/proteins. • Identification of conserved region. • Detecting point mutations. • Deducing evolutionary relationship and phylogenetic tree. • Assist in predicting secondary and tertiary structure.

  7. Optimization of that function What is the good alignment? (Computation) What is the matter of MSA ? • It stands at the cross road of three distinct technical difficulties. Choice of an objective function Choice of the sequences What is a good alignment? (Biology) Database Search Unknown Sequence

  8. The Choice of the sequences : Sequences sharing a common ancestor (homologous sequences) • PSI-BLAST, FASTA, Various Search Tools • The Choice of an objective function Biological problem that lies in the definition of correctness • Sum of pair, Entropy score, Consistency based, … • The Optimization of that function • Exact Algorithms (Dynamic Programming) • Progressive alignment (ClustalW) • Iterative approaches (SA, GA, …)

  9. Example : Sum of pair score Seq A: ARGTCAGATACGLAG---PGMCTETWV Seq B: ARATCGGAT---IAGTIYPGMCTHTWV Sequence alignments Scoring substitutions are represented in matrices. The popular ones are PAM or BLOSUM.

  10. Example : Sum of pair score (Cont.) Multiple Sequence alignments Seq A1: ARGTCAGATACGLAG---PGMCTETWV---- Seq A2: ARATCGGAT---IAGTIYPGMCTHTWVIAGQ Seq A3: ARATCE--TACG--GTI-PGMCTHTWVIA-- Exact method : multi-dimensional dynamic programming -Time complexity O(Ln2n), Space complexity O(Ln)

  11. How to make a MSA (Methods)

  12. Recent research in literature • MAFFT (2002) based on fast fourier transform • MUSCLE (2004) progressive alignment, pairwise profile alignment, position specific gap penalty, • PROBCONS (2005) progressive alignment, probability table using HMM, probabilistic consistency-based MSA

  13. 1 + 2 1 + 3 1 + 4 2 + 3 2 + 4 3 + 4 Example : Progressive alignment Pairwise Alignment Guide Tree MSA by adding sequences 1 2 3 4 2 3 4 1 1 2 3

  14. Progressive alignment (cont.) Sequence Guide Tree 1 2 3 4 5 1 1 2 3 4 5 Distance Matrix: displays distances of all sequence pairs. 2 4 5 3 D = 1 - S UPGMA(unweighted pair group method of arithmetic averages) or Neighbour-Joining method

  15. 3 3 3 3 5 5 5 5 1 1 1 1 2 2 2 2 4 4 4 4 UPGMA Clustering (Guide Tree) d d d d ij ij ij ij 1 2 3 4 5 1 0 2 6 9 7 2 0 5 7 7 3 0 5 4 4 0 3 5 0 u 3 v u 0 5 7 3 0 4 v 0 u w u 0 6 w 0 6 0 u 3 4 5 u 0 5 8 7 3 0 5 4 4 0 3 5 0 2 0 . 5 . 5 . 5 . . 8 5 4 0 . . 5 5 3 0

  16. Progressive alignment (cont.) • Columns - once aligned - are never changed. . . and new gaps are inserted. • Depend strongly on pairwise alignments and the intitial startingsequences • No guarantee that the global optimal solution will be found. • In case of sequences identity less than 25-30%, this approach become much less reliable. Guide Tree Alignment of alignments 1 2 4 5 2 3 1

  17. Progressive Alignment: Discussion • Strengths: • Speed • Progression biologically sensible (aligns using a tree) • Weaknesses: • No objective function. • No way of quantifying whether or not the alignment is good • Local minimum problem

  18. Consistency based score function Coffee Score function (Cedric Nortredame) : Given a set of sequences, the optimal MSA is defined as the one that agrees the most with all the possible optimal pair-wise alignments Score(Aij) = Number of aligned pairs of residues that are shared between Aij and the library. • do not depend on a specific substitution matrix • position dependant alignment. • the most consistent are often closer to the truth

  19. Summary • MSAs are essential tools in computational biology and bioinformatics. They are required for structure /function analysis and structure prediction. • No perfect method exists for assembling a MSA and all the available methods do approximations. • The most commonly used methods for MSA use a progressive alignment algorithm (ClustalW) • Recent progress have focused on the desigh of iterative (Prrp, SAGA) and consistency based methods (T-Coffee, probcons)

  20. MSA applications • Profile-profile alignment Profile: A table that lists the frequencies of each amino acid in each position of MSA. • Profile can be used in database searches • Find new sequences that match the profile • Improve search sensitivity • Improve search accuracy

  21. Example: Profiles • Profile: A table that lists the frequencies of each amino acid in each position of protein sequence. • Frequencies are calculated from a MSA containing a domain of interest • Allows us to identify consensus sequence • Derived scoring scheme allows us to align a new sequence to the profile • Profile can be used in database searches • Find new sequences that match the profile • Profiles also used to compute multiple alignments heuristically • Progressive alignment

More Related