1 / 98

Giuseppe Lancia Robert Carr Brian Walenz Sorin Istrail

101 Optimal PDB Structure Alignments: A Branch-and-Cut Algorithm for the Maximum Contact Map Overlap Problem. Giuseppe Lancia Robert Carr Brian Walenz Sorin Istrail. Contact Maps. CONTACT MAPS. Unfolded protein. CONTACT MAPS. Unfolded protein. Folded protein = contacts. CONTACT MAPS.

shalin
Download Presentation

Giuseppe Lancia Robert Carr Brian Walenz Sorin Istrail

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 101 Optimal PDB Structure Alignments:A Branch-and-Cut Algorithm for the Maximum Contact Map Overlap Problem Giuseppe Lancia Robert Carr Brian Walenz Sorin Istrail

  2. Contact Maps

  3. CONTACT MAPS Unfolded protein

  4. CONTACT MAPS Unfolded protein Folded protein = contacts

  5. CONTACT MAPS Unfolded protein Folded protein = contacts Contact map = graph

  6. CONTACT MAPS Unfolded protein Folded protein = contacts Contact map = graph OBJECTIVE:align 3d folds of proteins = align contact maps

  7. Contact Map of a Self-Avoiding Walk 1 2 3 4 5 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 0 0 1 2 3 4 5 1 2 4 3 5 1 2 3 4 5

  8. Contact Map Alignments

  9. Non-crossing Alignments Protein 1 Protein 2 non-crossing map of residues in protein 1 and protein 2

  10. The value of an alignment

  11. The value of an alignment

  12. The value of an alignment

  13. The value of an alignment Value = 3

  14. The value of an alignment Value = 3 We want to maximize the value

  15. Integer Programming Formulation

  16. Integer Programming Formulation The use of Integer Linear Programming * Exact solution * Heuristic + guarantee (LP upper bound)

  17. Integer Programming Formulation The use of Integer Linear Programming * Exact solution * Heuristic + guarantee (LP upper bound) e 0-1 VARIABLES yef yef for e and f contacts f

  18. Integer Programming Formulation The use of Integer Linear Programming * Exact solution * Heuristic + guarantee (LP upper bound) e 0-1 VARIABLES yef yef for e and f contacts e’ e f CONSTRAINTS yef + ye’f’<= 1 f’ f

  19. Integer Programming Formulation The use of Integer Linear Programming * Exact solution * Heuristic + guarantee (LP upper bound) e 0-1 VARIABLES yef yef for e and f contacts Gy e’ e f CONSTRAINTS yef + ye’f’<= 1 y f’ f max OBJECTIVE e f ef

  20. Independent Set Problem It’s just a huge max independent set problem in Gy: • a node for each sharing • an edge for each pair of incompatible sharings e’ e e’’ e’ f’ e f e’’ f’’ f’’ f f’

  21. Independent Set Problem It’s just a huge max independent set problem in Gy: • a node for each sharing • an edge for each pair of incompatible sharings e’ e e’’ e’ f’ e f e’’ f’’ f’’ f f’ |Gy|=|E1|*|E2| (approximately 5000 for two proteins with 50 residues and 75 contacts each) The best exact algorithm for independent set can solve for at most a few hundred nodes

  22. Node to Node Variables New variables x provide an easy check for the non-crossing conditions e NEW VARIABLES i xij for i and j residues yef xij j f

  23. Node to Node Variables New variables x provide an easy check for the non-crossing conditions e NEW VARIABLES i xij for i and j residues yef xij j f NEW CONSTRAINTS i i’ j’ j xij + xi’j’ <= 1

  24. Node to Node Variables New variables x provide an easy check for the non-crossing conditions e NEW VARIABLES i xij for i and j residues yef xij j f NEW CONSTRAINTS i i’ p i q j j’ j xij + xi’j’<= 1 y(ip)(jq) <= xij and y(ip)(jq)<= xpq

  25. Clique Constraints Variables x define a graph Gx: • A node for each line • An edge between each pair of crossing lines i i’ i’ j’ i j j’ j

  26. Clique Constraints Variables x define a graph Gx: • A node for each line • An edge between each pair of crossing lines i i’ i’ j’ i j j’ j • Gx is much smaller than Gy • Gx has nice proprieties (it’s a perfect graph) • It’s easier to find large independent sets in Gx

  27. Clique Constraints Non-crossing constraints can be extended to CLIQUE CONSTRAINTS S xij<= 1 [i,j] in M For all sets M of mutually incompatible (i.e. crossing) lines All clique constraints satisfied (and Gx perfect) imply a strong bound!

  28. Structure of Maximal cliques in Gx 1. Pick two subsets of same size

  29. Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx 2. Connect them in a zig-zag fashion

  30. Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx

  31. Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx

  32. Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx

  33. Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx

  34. Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx

  35. Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx

  36. Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx

  37. Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx

  38. Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx

  39. Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx 3. Throw in all lines included in a zig or a zag

  40. Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx 3. Throw in all lines included in a zig or a zag

  41. Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx The result is a maximal clique in Gx

  42. Separation of Clique Inequalities PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ?

  43. Separation of Clique Inequalities PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ? SOLUTION We don’t add them in the original LP, but only when needed at run time. Not all of them will be needed, so we are fine as long as…

  44. Separation of Clique Inequalities PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ? SOLUTION We don’t add them in the original LP, but only when needed at run time. Not all of them will be needed, so we are fine as long as… SEPARATION …we can generate in polynomial time a clique inequality when needed, i.e., when violated by the current LP solution x* S x*ij> 1 [i,j] in M

  45. Separation of Clique Inequalities PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ? SOLUTION We don’t add them in the original LP, but only when needed at run time. Not all of them will be needed, so we are fine as long as… SEPARATION …we can generate in polynomial time a clique inequality when needed, i.e., when violated by the current LP solution x* S x*ij> 1 [i,j] in M THEOREM We can find the most violated clique inequality in time O(n2)

  46. Separation of Clique Inequalities PROOF (sketch) 1) Clique = zigzag path

  47. Separation of Clique Inequalities PROOF (sketch) 1) Clique = zigzag path 1 2 3 4 5 6 7 8

  48. Separation of Clique Inequalities PROOF (sketch) 2) Flip one graph: zigzag leftright 1) Clique = zigzag path 1 2 3 4 5 6 7 8 8 7 6 5 4 3 2 1

More Related