1 / 83

Burkhard Morgenstern Institut f ür Mikrobiologie und Genetik

Burkhard Morgenstern Institut f ür Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen B äumen WS 2006/2007. Goal: Phylogeny reconstruction based on molecular sequence data (DNA, RNA, protein sequences). Multiple sequence alignment.

elinor
Download Presentation

Burkhard Morgenstern Institut f ür Mikrobiologie und Genetik

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007

  2. Goal: Phylogeny reconstruction based on molecular sequence data (DNA, RNA, protein sequences)

  3. Multiple sequence alignment • Molecular phylogeny reconstruction relies on comparative nucleic acid and protein sequence analysis • Alignment most important tool for sequence comparison • Multiple alignment contains more information than pair-wise alignment

  4. Tools for multiple sequence alignment Y I M Q E V Q Q E R • Sequence duplicates in history (e.g. speciation event)

  5. Tools for multiple sequence alignment Y I M Q E V Q Q E R

  6. Tools for multiple sequence alignment Y I M Q E V Q Q E R Y I M Q E V Q Q E R

  7. Tools for multiple sequence alignment Y I M Q E A Q Q E R Y L M Q E V Q Q E R • Substitutions occur

  8. Tools for multiple sequence alignment Y I M Q E A Q Q E R Y L M Q E V Q Q E R

  9. Tools for multiple sequence alignment YAI M Q E A Q Q E R Y L M - - V Q Q E R V • Insertions/deletions (indels) occur

  10. Tools for multiple sequence alignment YAI M Q E A Q Q E R Y L M - - V Q Q E R V

  11. Tools for multiple sequence alignment Y A I M Q E A Q Q E R Y L M V Q Q E R V • because of insertions/deletions: sequence similarity no longer immediately visible!

  12. Tools for multiple sequence alignment Y A I M Q E A Q Q E R - Y - L M V - - Q Q E R V • Alignment brings together related parts of the sequences by inserting gaps into sequences

  13. Tools for multiple sequence alignment Y A I M Q E A Q Q E R - Y - L M V - - Q Q E R V

  14. Tools for multiple sequence alignment Y AI M QE A Q Q E R - Y -L M V- - Q Q E R V • Mismatches correspond to substitutions • Gaps correspond to indels

  15. Tools for multiple sequence alignment • Pairwise alignment: alignment of two sequences • Multiple alignment: alignment of N > 2 sequences

  16. Tools for multiple sequence alignment s1 R Y I M R E A Q Y E S A Q s2 R C I V M R E A Y E s3 Y I M Q E V Q Q E R s4 W R Y I A M R E Q Y E • Assumtion: sequence family related by common ancestry; similarity due to common history • Sequence similarity not obvious (insertions and deletions may have happened)

  17. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E- - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E- - - • Multiple alignment = arrangement of sequences by introducing gaps • Alignment reveals sequence similarities

  18. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E- - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E- - -

  19. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E- - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E- - -

  20. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y E S A Q s2 - R C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - - General information in multiple alignment: • Functionally important regions more conserved than non-functional regions • Local sequence conservation indicates functionality!

  21. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Phylogeny reconstruction based on multiple alignment: • Estimate pairwise distances between sequences (distance-based methods for tree reconstruction) • Estimate evloutionary events in evolution (parsimony and maximum likelihood methods)

  22. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Task in bioinformatics: Find best multiple alignment for given sequence set

  23. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Astronomical number of possible alignments!

  24. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - - - Y E - s3 Y I - - - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Astronomical number of possible alignments!

  25. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - - - Y E - s3 Y I - - - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Computer has to decide: which one is best??

  26. Tools for multiple sequence alignment Questions in development of alignment programs: (1) What is a good alignment? → objective function (`score’) (2) How to find a good alignment? → optimization algorithm First question far more important !

  27. Tools for multiple sequence alignment Before defining an objective function (scoring scheme) • What is a biologically good alignment ??

  28. Tools for multiple sequence alignment Criteria for alignment quality: • 3D-Structure: align residues at corresponding positions in 3D structure of protein!

  29. Tools for multiple sequence alignment Criteria for alignment quality:

  30. Tools for multiple sequence alignment Criteria for alignment quality: • 3D-Structure: align residues at corresponding positions in 3D structure of protein!

  31. Tools for multiple sequence alignment Species related by common history

  32. Tools for multiple sequence alignment Genes / proteins related by common history

  33. Tools for multiple sequence alignment Criteria for alignment quality: • 3D-Structure: align residues at corresponding positions in 3D structure of protein! • Evolution: align residues with common ancestors!

  34. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Alignment hypothesis about sequence evolution • Mismatches correspond to substitutions • Gaps correspond to insertions/deletions

  35. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Alignment hypothesis about sequence evolution • Search for most plausible scenario! • Estimate probabilities for individual evolutionary events: insertions/deletions, substitutions

  36. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E - - - s3 - Y - I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Alignment hypothesis about sequence evolution • Search for most plausible scenario! • Estimate probabilities for individual evolutionary events: insertions/deletions, substitutions

  37. Tools for multiple sequence alignment Compute score s(a,b) for degree of similarity between amino acids a and b based on probability pa,b of substitution a → b (or b → a) (Extremely simplified!)

  38. Tools for multiple sequence alignment

  39. Tools for multiple sequence alignment Reason for different substitutin probabilities pa,b : • Different physical and chemical properties of amino acids • Amino acids with similar properties more likely to be substituted against each other

  40. Tools for multiple sequence alignment Use penalty for gaps introduced into alignment • Simplest approach: linear gap costs: penalty proportional to gap length • Non-linear gap penalties more realistic: long gap caused by single insertion/deletion • Most frequently used: affine linear gap penalties: more realistic, but efficient to calculate!

  41. Traditional Objective functions: Define Score of alignments as • Sum of individual similarity scores s(a,b) • Minus gap penalties Needleman-Wunschscoring system for pairwise alignment (1970)

  42. Pair-wise sequence alignment T Y W I V T - - L V Example: Score = s(T,T) + s(I,L) + s (V,V) – 2 g Assumption: linear gap penalty!

  43. Pair-wise sequence alignment T Y W I V T - - L V Dynamic-programming algorithm finds alignment with best score. (Needleman and Wunsch, 1970)

  44. Pair-wise sequence alignment T Y W I V T - - L V • Running time proportional to product of sequence length • Time-complexity O(l1 * l2)

  45. Pair-wise sequence alignment • Algorithm for pairwise alignment can be generalized to multiple alignment of N sequences • Time-complexity O(l1 * l2 * … * lN) • Not feasable in reality (too long running time!) • Heuristic necessary, i.e. fast algorithm that does not necessarily produce mathematically best alignment

  46. `Progressive´ Alignment Most popular approach to (global) multiple sequence alignment: • Progressive Alignment Since mid-Eighties: Feng/Doolittle, Higgins/Sharp, Taylor, …

  47. `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP

  48. `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Guide tree

  49. `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASFQPVAALERIN WLNYNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

More Related