1 / 21

Multiple Alignment and Phylogenetic Trees

Multiple Alignment and Phylogenetic Trees. Csc 487/687 Computing for Bioinformatics. Multiple Sequence Alignment. One amino acid sequence plays coy; a pair of homologous sequences whisper; many aligned sequences shout out loud. Very informative. Definition.

morrison
Download Presentation

Multiple Alignment and Phylogenetic Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics

  2. Multiple Sequence Alignment • One amino acid sequence plays coy; a pair of homologous sequences whisper; many aligned sequences shout out loud. • Very informative

  3. Definition • A global alignment of a set of sequences is obtained by • inserting into each sequence gap characters • so that • the resulting sequences are of the same length • and so that • no “column” has only gap characters

  4. Example: Chromo domains aligned

  5. Use of alignments • High sequence similarity usually means significant structural and/or functional similarity. The reverse does not need to be true • Homolog proteins (common ancestor) can vary significantly in large parts of the sequences, but still retain common 2D-patterns, 3D-patterns or common active site or binding site. • Comparison of several sequences in a family can reveal what is common for the family. Something common for several sequences can be significant when regarding all of the sequences, but need not if regarding only two. • Multiple alignment can be used to derive evolutionary history.

  6. Use of alignments • Predict features of aligned objects • conserved positions • structurally/functionally important

  7. Conserved positions

  8. Use of alignments • Predict features of aligned objects • conserved positions • structurally/functionally important • patterns of hydrophobicity/hydrophilicity • secondary structure elements

  9. Helix pattern

  10. Use of alignments • Predict features of aligned objects • conserved positions • structurally/functionally important • patterns of hydrophobicity/hydrophilicity • secondary structure elements • “gappy” regions • loops/variable regions

  11. Loop? Loop? Loop?

  12. Use of Alignments- make patterns/profiles • Can make a profile or a pattern that can be used to match against a sequence database and identify new family members • Profiles/patterns can be used to predict family membership of new sequences • Databases of profiles/patterns • PROSITE • PFAM • PRINTS • ...

  13. Protein sequence Prosite pattern 1 Prosite pattern 2 Prosite pattern n Family 1 Family 2 Family n Prosite: Motifs for classification Regular expression Pattern Profile

  14. Pattern from alignment [FYL]-x-[LIVMC]-[KR]-W-x-[GDNR]-[FYWLE]-x(5,6)-[ST]-W-[ES]-[PSTDN]-x(3)-[LIVMC]

  15. Alignment problem Given a set of sequences, produce a multiple alignment which corresponds as well as possible to the biological relationships between the corresponding bio-molecules

  16. For homologous proteins • Two residues should be aligned (on top of each other) • if they are homologous (evolved from the same residue in a common ancestor protein) • if they are structurally equivalent

  17. Automatic approach • Need a way of scoring alignments • fitness function which for an alignment quantifies its “goodness” • Need an algorithm for finding alignments with good scores • Not all methods provide a scoring function for the final alignment!

  18. Analysis of fitness function • One can test whether the alignments optimal under a given fitness function correspond well to the biological relationships between the sequences • For example, if the structure of (some of) the proteins are known.

  19. Align by use of dynamic programming • Dynamic programming finds best alignment of k sequences with given scoring scheme • For two sequences there are three different column types • For three sequences there are seven different column types x means an amino acid, - a blank Sequence1 x - x x - - x Sequence2 x x - x - x - Sequence3 x x x - x - x • Time complexity of O(nk) (sequence lengths = n)

  20. Use of dynamic programming • Dynamic programming finds best alignment of k sequences given scoring scheme

  21. Algorithm for dynamic programming

More Related