1 / 31

BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics ,

BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics , Dec 10-14, Australian National University . Canberra, Australia. CLe PAPS : Fast P air A lignment of P rotein S tructures Based on C onformational Le tters. Sheng WANG, Wei-Mou ZHENG*

percy
Download Presentation

BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics ,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioInfoSummer07ICE-EM Summer Symposium in BioInformatics , Dec 10-14, Australian National University . Canberra, Australia CLePAPS: Fast Pair Alignment of Protein Structures Based on Conformational Letters Sheng WANG, Wei-Mou ZHENG* Institute of Theoretical Physics, CAS zheng@itp.ac.cn *To whom correspondence should be addressed

  2. Outline • [1] Introduction • [2] The flow chart of CLePAPS Algorithm • [2-1] Find SFPs by CLeSUM • [2-2] Construct ‘Star-Tree’ • [2-3] The ‘Zoon-In’ Strategy • [3] Result & Discussion

  3. Chapter[1] : Introduction Page 1 (Chp1) • Structure alignment --- a self-consistent problem • Correspondence Rigid transformation • However, when aligning two protein structures, at the beginning we know neither the transformation nor the correspondence. • DALI, CE • VAST • STRUCTAL, ProSup • CLePAPS: Conformational Letters based Pairwise Alignment of Protein Structures • Initialization + iteration • Similar Fragment Pairs (SFPs); • Anchor-based; • Alignment = As many consistent SFPs as possible

  4. Chapter[1] : Introduction Page 2 (Chp1) SFPs Anchor-based superposition consistent Anchor SFP inconsistent Alignment = Collect as many consistent SFPs as possible

  5. Chapter[1] : Introduction Page 3 (Chp1) Align Structure Alignment => a self-consistent problem ProteinA ProteinB Initial correspondence (Anchor SFP) Optimal transformation for the correspondence No Convergence? Yes End Correspondence update (adding consistent SFPs)

  6. Chapter[1] : Introduction Page 4 (Chp1) Four Main Problems [1] How can we find SFPs as fast as possible? [2] How can we balanceSpecificity and Sensitivity of the found SFPs? [3] How can we avoid a start? [4] How can we haste the convergence while not to be Local Traped? LOCAL TRAP

  7. An example of LOCAL TRAP

  8. Chapter[2] : The flow chart of CLePAPS Algorithm Page 5 (Chp2) Find SFPs By CLeSUM Initial correspondence (Select an Optimal Anchor SFP) SFP List (width 20) Star-Tree Construct Top K for anchor Top J for neighbor Specificity Part_I: SFP Part_II: ‘Star-Tree’ Optimal Anchor SFP Part_II: ‘Star-Tree’ d1 blank-filling First Update Correspondence update (adding consistent SFPs without Local Trap and to haste the convergence) Sensitivity SFP List (width 8) d2 blank-filling Second Update Part_III: ‘Zoom-In’ d3 blank-filling Third Update Part_III: ‘Zoom-In’ Final Alignment

  9. Chapter[2-1] : Find SFPs by CLeSUM Page 6 (Chp2) Find SFPs By CLeSUM Part_I: SFP Hint: SFP(Similarity Fragment Pair) CLeSUM (Conformational Letter SUbstitution Matrix )

  10. Chapter[2-1] : Find SFPs by CLeSUM Page 7 (Chp2) The main difference of CLePAPS from other existing algorithms for structure alignment is the use of Conformational Letters. Conformational letters = discretized states of 3D segmental conformations. A letter = a cluster of combinations of three angles formed by Capseudobonds of four contiguous residues. (obtained by clustering according to the probability distribution.) Fig.1 Centers of 17 conformational letters

  11. Chapter[2-1] : Find SFPs by CLeSUM Page 8 (Chp2) Similarity between conformational letters CLeSUM: Conformational Letter SUbstitution Matrix typical helix evolutionary + geometric typical sheet Mij = 20* log 2 (Pij/PiPj) ~ BLOSUM83, H ~ 1.05 constructed using FSSP representatives.

  12. Chapter[2-1] : Find SFPs by CLeSUM Page 9 (Chp2) • SFP => highly scored string pair • Fast search for SFPs by string comparison • CLESUM similarity score importance of SFPs • Guided by CLESUM scores, only the top few SFPs need to be examined • to determine the superposition for alignment, and hence a reliable greedy strategy becomes possible. Example similar Protein A seed Protein B (smaller)

  13. 1cewI 1molA An example of Find SFP Align To find SFP , we take the shorter sequence as template , and record every pair position which score is higher than the threshold , the fragment is at a given length seed >1molA RRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCPLDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR >1cewI RRCECECAJGBIHHHHHHHHIIHHHIIGPGBLDFFCPLDPLEEFEDPOLCEEEEEEDEFDEAGCAKLAJGKHHIIMNGKLQQQDEEEDEEEEEBPKKOGEEDPLEEER Similar Fragment Pair (SFP) FEDECCGA OLCEEEEE FEDPLDEQ EEDPLEEE PLDDEEED PLEEFEDP CEDEEEEE EEDEEEEE HHHHHHHH AJGKHHII 1 2 3 4 5 Score rank

  14. Chapter[2-2] : Construct ‘Star-Tree’ Page 10 (Chp2) Find SFPs By CLeSUM Part_I: SFP Hint: SFP List (width 20) => We create a list of SFP with length 20 and sort them by CLeSUM score Top_K & Top_J ( J > K ) => We only select the Top_K of the list as Anchor SFP and check their consistency use Top_J for neighbor

  15. Chapter[2-2] : Construct ‘Star-Tree’ Page 11 (Chp2) Example Selection of Optimal Anchor SFP Score rank 1 5 4 2 3 Example: Top K, K = 2; Top J,J = 5 1 Anchor SFP 2 Anchor SFP # of consistent SFPs = 4 # of consistent SFPs = 1 Top_1 SFP is globally supported by three other SFPs, while Top_2 SFP is supported only by itself.

  16. 1cewI 1molA An example of ‘Star-Tree’ construct Align Top_2 SFP Top_1 SFP Anchor Anchor Consistent # of consistent SFBs = 4 # of consistent SFBs = 1 ‘Star-Tree’ view

  17. Chapter[2] : The flow chart of CLePAPS Algorithm Page 5 (Chp2) Top 1(4) Top 2(1) Find SFPs By CLeSUM SFP List (width 20) Star-Tree Construct Top K for anchor Top J for neighbor Specificity Optimal Anchor SFP Part_I: SFP Part_II: ‘Star-Tree’ d1 blank-filling First Update Correspondence update (adding consistent SFPs without Local Trap and to haste the convergence) Sensitivity SFP List (width 8) d2 blank-filling Second Update Part_III: ‘Zoom-In’ d3 blank-filling Third Update Part_III: ‘Zoom-In’ Final Alignment

  18. Chapter[2-3] : The ‘Zoon-In’ Strategy Page 12 (Chp2) Find SFPs By CLeSUM Part_I: SFP Hint: SFP List (width 8) => We create a list of SFP with length 8 and sort them by CLeSUM score (descending order) blank-filling => We add consistent SFPs one by one from SFP List (width 8) to update the correspondence

  19. Chapter[2-3] : The ‘Zoon-In’ Strategy Page 13 (Chp2) Example d1 d2 d1 > d2 > d3 。 。 。 8A 6A 5A d3 [1] The first transformation is determined by the Optimal Anchor SFP , so we use a large cutoff d1 to avoid LOCAL TRAP [2] The later transformation is determined by a set of globally consistent SFPs , so we use a lower cutoff to add new consistent SFPs

  20. An example of ‘Zoom-In’ strategy d1 > d2 > d3 。 。 。 8A 6A 5A d1 d2 Fisrt Update Second Update Elongation Shrink Final Alignment Third Update d3

  21. Chapter[2] : The flow chart of CLePAPS Algorithm Page 5 (Chp2) Top 1(4) Top 2(1) Find SFPs By CLeSUM SFP List (width 20) Star-Tree Construct Top K for anchor Top J for neighbor Specificity Optimal Anchor SFP Part_I: SFP Part_II: ‘Star-Tree’ d1 blank-filling First Update Sensitivity SFP List (width 8) d2 blank-filling Second Update d3 blank-filling Third Update Part_III: ‘Zoom-In’ Final Alignment

  22. Chapter[3] : Result & Conclusion Page 14 (Chp3) Four Main Problems CLePAPS ‘s Solution [1] How can we find SFPs as fast as possible? [2] How can we balanceSpecificity and Sensitivity of the found SFPs ? [3] How can we avoid a Local Trap start? [4] How can we haste the convergence while not to be Local Traped ? [1] Fast search for SFPs by merely string comparison [2] Width 20 for Specificity and width 8 for Sensitivity, both sorted by CLeSUM score [3] Optimal Anchor SFP selected through ‘Star-Tree’ [4] Fast ‘Zoom-In’ strategy to convergence only within three times

  23. Chapter[3] : Result & Conclusion Page 15 (Chp3) • The Fischer benchmark test • Database search with CLePAPS • Multi-Solution of alignments: symmetry, domain move, repeats • Non-topological alignment and domain shuffling [pdb:1ihwA] [pdb:1ssoA]

  24. Multi-Solution[1] : Symmetry Red structure fixed [pdb:4fgf] [OGCCFEFAHOGEED] [OGDCEDFAIOGEED] [KGFCEDDAJOGCCC] [pdb:4fgf][pdb:8i1b] Solution [A] Solution [B] Solution [C]

  25. Multi-Solution[2] : Domain Move Blue structure fixed [pdb:2gbp][pdb:2liv] Domain_1 Domain_2 Solution [A] Solution [B]

  26. Multi-Solution[3] : Repeats Blue structure fixed [pdb:4cpv][pdb:1osa] Repeat_1 Repeat_2 Solution [A] Solution [B]

  27. Chapter[3] : Result & Conclusion Page 16 (Chp3) • Conclusion • CLePAPS distinguishes itself from other existing algorithms for pairwise structure alignment in its use of conformational letters. • conformational letters : aptly balance precision with simplicity • CLeSUM: a proper measure of similarity between states • CLeSUM extracted from the database FSSP contains information of structure database statistics, which reduces the chance of accidental matching of two irrelevant helices. evolutionary + geometric = specificity gain • For example, two frequent helices are geometrically very similar, • but their score is relatively low. • CLeSUM similarity score can be used to sort the importance of SFPs for a greedy algorithm. Only the top few SFPs need to be examined.

  28. Chapter[3] : Result & Conclusion Page 17 (Chp3) 1, Fast search for SFPs by merely string comparison 2, Width 20 for specificity + width 8 for sensitivity 3, Optimal Anchor SFP selected by checking consistency 4, Avoid Local Trap by ’zoom-in’ The running time for the 68 pairs of the Fischer benchmark is less than 2% of that of the downloaded CE local version. Next steps 1, BLOMAPS: fast multiple structure alignment; SFPs → Highly Similar Fragment Blocks (HSFBs) 2, Include biochemical information into CLESUM by amino acid clustering. Entropic clustering: AVCFIWLMY (h) + DEGHKNPQRST (p)

  29. Thank you

  30. Step 1 Step 2 N-Terminal Step 3 C-Terminal Step 1 get four continuous Cα atom Step 2 get two bending angle θ and θ’ and one torsion angle τ Step 3 select the most similar one from the 17 states Step 4 assign the code Step 4 >1molA RRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCPLDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR

  31. θ τ θ’

More Related