1 / 29

Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments

Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments. Susan Bibeault June 9, 2000. Outline. Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work. Outline. Problem Statement and Importance

otto
Download Presentation

Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments Susan Bibeault June 9, 2000

  2. Outline • Problem Statement and Importance • Terminology • Current Approaches • Our Alignment Heuristic • Performance Results • Conclusions • Future Work

  3. Outline • Problem Statement and Importance • Terminology • Current Approaches • Our Alignment Heuristic • Performance Results • Conclusions • Future Work

  4. V-LSPADN--VKAAWGKVGAHAGEYGAEALERM---F- VHLTPEEKSAVTALWGKVNVD--EVGGEALGRLLVVYP G-LSDGEWQLVLNVWGKVEA---DIPGHVLIRL---FK -VLSPADN--VKAAWGKVGAHAGEYGAEALERMF---- VHLTPEEKSAVTALWGKVNVD--EVGGEALGRLLVVYP -GLSDGEWQLVLNVWGKVEA---DIPGHVLIRLFK--- Multiple Sequence Alignment • Problem Given Sequence Set: • Insert gaps into sequences so that evolutionary conserved regions are aligned • Important tool • Relate Homologous Proteins • Discover Conserved Regions VLSPADNVKAAWGKVGAHAGEYGAEALERMF VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVY GLSDGEWQLVLNVWGKVEADIPGHVLIRLFK

  5. Outline • Problem Statement and Importance • Terminology • Current Approaches • Our Alignment Heuristic • Performance Results • Conclusions • Future Work

  6. Sum of Pairs Tree based gorilla human orangutan chimpanzee gibbon  cost(i,j)  cost(edge)m Scoring Multiple Alignments  cost(i,j) = 6  cost(edge) = 1m

  7. Scoring Cost Matrix: C (aa1, aa2) Gaps Penalties: Simple: C (aa, -) Affine: C(-) + Len * C (aa,-) Alignments V L S P A D N V K A G L S D G E W Q L V L Cost(s[1..i],t[i..j]) = min( Cost(s[1..i],t[i..j-1]) – g, Cost(s[1..i-1],t[i..j-1]) – C(s[i],t[j]) Cost(s[1..i-1],t[i..j]) – g))

  8. Outline • Problem Statement and Importance • Terminology • Current Approaches • Our Alignment Heuristic • Performance Results • Conclusions • Future Work

  9. Current Approaches Global Alignment ABCDEFGHI :::: :::: ABCD-FGHI Local Alignment XXXABCDYYY :::: ZZZABCDEEEE • Global Methods • Optimal Algorithms (MSA, MWT, MUSEQAL) • Progressive (MULTALIGN, PILEUP, CLUSTAL, MULTAL, AMULT, DFALIGN, MAP, PRRP, AMPS) • Local methods • PIMA, DIALIGN, PRALIGN, MACAW, BlockMaker, Iteralign • Combined (GENALIGN, ASSEMBLE, DCA) • Statistical (HMMT, SAGA, SAM, Match Box) • Parsimony (MALIGN, TreeAlign) • Global Methods • Optimal Algorithms (MSA, MWT, MUSEQAL) • Progressive (MULTALIGN, PILEUP, CLUSTAL, MULTAL, AMULT, DFALIGN, MAP, PRRP, AMPS) • Local methods • PIMA, DIALIGN, PRALIGN, MACAW, BlockMaker, Iteralign • Combined (GENALIGN, ASSEMBLE, DCA) • Statistical (HMMT, SAGA, SAM, Match Box) • Parsimony (MALIGN, TreeAlign)

  10. Outline • Problem Statement and Importance • Terminology • Current Approaches • Our Alignment Heuristic • Performance Results • Conclusions • Future Work

  11. Distance Estimation Tree Construction Node Initialization Tree Partitioning Iteration Our Heuristic

  12. PESLALYNKFSIKSDVW PEALNYGRY-SSESDVW PESLALYNKF---SIKSDVW PEALNYGRY----SSESDVW PESLALYNKFSIKSDVW PEAL-NYGRYSSESDVW Estimation of Protein Distance Aligned Sequences Estimated Pair Distances Issue: Implied vs. Optimal Pair Alignments PEAAALYGRFT---IKSDVW PESAALYGRFT---IKSDVW PESLALYNKF---SIKSDVW PEALNYGRY----SSESDVW PEALNYGWY----SSESDVW PEVIRMQDDNPFSFSQSDVY PEALNYGWY----SSESDVW PEVIRMQDDNPFSFSQSDVY

  13. Optimal Pair vs. Implied Pair

  14. Interior Node Classification • Interior Nodes Classified by Percent Identity • PID = (# matched residues) / (# total residues) • User Specified Tiers • User Specified Cost Criterion • Example: • PID > 60% -- PAM 40 – High Gap Penalties • PID > 40% -- PAM 120 – Medium Gap Penalties • PID < 40% -- PAM 200 – Low Gap Penalty

  15. Ordering Alignments Isolate Sub Trees Threshold PID Order Alignments • Sub Tree • Border Nodes • Integrate All

  16. Sum of Pairs Bounded Search Implementation Modular Reentrant Flexible Cost Criterion Interior Alignments

  17. Generating Consensus Alignment (A1,A2,A3) Consensus X • Min ( Di(Ai,X) ) For Each Position i: Xi =   A1 D1 D2 A2 X D3 A3 Min (cost(, A1i) + cost(, A2i) + cost(, A3i))

  18. Outline • Problem Statement and Importance • Terminology • Current Approaches • Our Alignment Heuristic • Performance Results • Conclusions • Future Work

  19. Testing the Method • BAliBASE benchmark • “Correct” Alignments • Core Blocks of Conserved Motifs • Typical “Hard Problem” Sets • Protein Parsimony • Measures “Evolutionary Steps” of Alignment

  20. Baseline BAliBASE SP better

  21. Baseline BAliBASE TC better

  22. Baseline - ProtPars better

  23. Orphans/Families BAliBASE SP better

  24. Orphans/Families ProtPars better

  25. Larger Families better

  26. Outline • Problem Statement and Importance • Terminology • Current Approaches • Our Alignment Heuristic • Performance Results • Conclusions • Future Work

  27. Conclusions • Solution Quality • Captures Evolutionary Information • Iterations Converge Quickly • Useful Tool

  28. Outline • Problem Statement and Importance • Terminology • Current Approaches • Our Alignment Heuristic • Performance Results • Conclusions • Future Work

  29. Future Work • Improved Alignment Consensus • Multiple Partitioning Thresholds • Multiple Solutions • Integrated Phylogeny Modifications • Parallel Implementation

More Related