1 / 135

Protein structure comparison and contact maps

Protein structure comparison and contact maps. A Protein is a complex molecule with a primary, linear structure (a sequence of aminoacids ) and a 3-Dimensional structure (the protein fold ). Protein STRUCTURE determines its FUNCTION. For instance, the Drug Design problem

zea
Download Presentation

Protein structure comparison and contact maps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein structure comparison andcontact maps

  2. A Protein is a complex molecule with a primary, linear structure (a sequence of aminoacids) and a 3-Dimensional structure (the protein fold). Protein STRUCTURE determines its FUNCTION For instance, the Drug Design problem calls for constructing peptides with a 3D shape complementary to a protein, so as to dock onto it.

  3. Problem: Align two 3D protein structures Motivation: Structure Alignment is Important for: - Discovery of Protein Function (shape determines function) - Search in 3D data bases - Protein Classification and Evolutionary Studies • Assessment of Fold Prediction quality (e.g. CASP) • …..

  4. Contact Maps

  5. CONTACT MAPS Unfolded protein

  6. CONTACT MAPS Unfolded protein Folded protein = contacts

  7. CONTACT MAPS Unfolded protein Folded protein = contacts Contact map = graph

  8. CONTACT MAPS Unfolded protein Folded protein = contacts Contact map = graph OBJECTIVE:align 3d folds of proteins = align contact maps

  9. Contact Maps are related to fold: Similar folds  similar contact maps We studied the problem of determining contact map similarity

  10. Contact Maps are related to fold: Similar folds  similar contact maps We studied the problem of determining contact map similarity • In the period 2001-2004 • ------------------------------ • I.P. formulation via Branch & Cut (RECOMB) • Use of Compact Optimization instead of separation (AIRO) • Lagrangian Relaxation (RECOMB) • (Pubblications: RECOMB proceedings, AIRO proceedings, • OR Letters, Journal of Comp. Bio., 4OR)

  11. The Contact Map AlignmentProblem

  12. Non-crossing Alignments Protein 1 Protein 2 non-crossing map of residues in protein 1 and protein 2

  13. The value of an alignment

  14. The value of an alignment

  15. The value of an alignment

  16. The value of an alignment Value = 3

  17. The value of an alignment Value = 3 We want to maximize the value

  18. The value of an alignment NP-Hard (Goldman, Istrail, Papadimitriou, 1999)

  19. Integer Programming Formulation(5th RECOMB conference)

  20. Integer Programming Formulation The use of Integer Linear Programming • Model a difficult problem by 0-1 variables, linear objective function and • linear constraints • Can find optimal solution by branch and bound • Bound comes from LP relaxation (polynomial) • Bound can be used to access quality of any feasible sol

  21. (i) 0-1 VARIABLES e CONTACT-CONTACT VARS yef yeffor e and f contacts f RESIDUE-RESIDUE VARS i xijfor i and j residues xij yef j

  22. (ii) OBJECTIVE maximize SeSfyef over all feasible x and y

  23. (iii) CONSTRAINTS (FEASIBILITY) i i’ p i q j j’ j xij + xi’j’<= 1 y(ip)(jq) <= xij and y(ip)(jq)<= xpq activation non-crossing

  24. Non-crossing clique Constraints Variables x define a graph Gx: • A node for each line • An edge between each pair of crossing lines i i’ i’ j’ i j j’ j

  25. Clique Constraints Variables x define a graph Gx: • A node for each line • An edge between each pair of crossing lines i i’ i’ j’ i j j’ j • An independent set corresponds to a noncrossing alignment • Gx has nice proprieties (it’s a perfect graph) • It’s easy (poly) to find large independent sets in Gx

  26. Clique Constraints Non-crossing constraints can be extended to CLIQUE CONSTRAINTS S xij<= 1 [i,j] in M For all sets M of mutually incompatible (i.e. crossing) lines All clique constraints satisfied imply a strong bound!

  27. Maximal cliques in Gx

  28. Structure of Maximal cliques in Gx 1. Pick two subsets of same size

  29. Structure of Maximal cliques in Gx 2. Connect them in a zig-zag fashion

  30. Structure of Maximal cliques in Gx

  31. Structure of Maximal cliques in Gx

  32. Structure of Maximal cliques in Gx

  33. Structure of Maximal cliques in Gx

  34. Structure of Maximal cliques in Gx

  35. Structure of Maximal cliques in Gx

  36. Structure of Maximal cliques in Gx

  37. Structure of Maximal cliques in Gx

  38. Structure of Maximal cliques in Gx

  39. Structure of Maximal cliques in Gx 3. Throw in all lines included in a zig or a zag

  40. Structure of Maximal cliques in Gx 3. Throw in all lines included in a zig or a zag

  41. Structure of Maximal cliques in Gx The result is a maximal clique in Gx

  42. Separation of Clique Inequalities

  43. Separation of Clique Inequalities PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ?

  44. PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ? SOLUTION We don’t add them in the original LP, but only when needed at run time. Not all of them will be needed, so we are fine as long as…

  45. PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ? SOLUTION We don’t add them in the original LP, but only when needed at run time. Not all of them will be needed, so we are fine as long as… SEPARATION …we can generate in polynomial time a clique inequality when needed, i.e., when violated by the current LP solution x* S x*ij> 1 [i,j] in M

  46. PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ? SOLUTION We don’t add them in the original LP, but only when needed at run time. Not all of them will be needed, so we are fine as long as… SEPARATION …we can generate in polynomial time a clique inequality when needed, i.e., when violated by the current LP solution x* S x*ij> 1 [i,j] in M THEOREM We can find the most violated clique inequality in time O(n2)

  47. Separation of Clique Inequalities 2 n1 1 i 1 2 u n2 Create n1x n2 grid

  48. Separation of Clique Inequalities 2 n1 1 i 1 2 x*iu u x*iu n2 Create n1x n2 grid Orient all edges and give weights

  49. Separation of Clique Inequalities B=(n1,1) 0 .20 .15 .35 0 0 .25 .20 .30 A=(1,n2) Create n1x n2 grid Orient all edges and give weights There is violated clique iff longest A,B path has length > 1

  50. The method which adds violated inequalities by separation is called BRANCH-and-CUT • The method can get stuck in long runs of cut additions each of which “cuts very little” • There is an alternative to this, called COMPACT OPTIMIZATION

More Related