1 / 47

Exploring Folding Landscapes with Motion Planning Techniques

Exploring Folding Landscapes with Motion Planning Techniques. Bonnie Kirkpatrick Montana State University. Dr. Nancy Amato Guang Song Xinyu Tang Texas A&M University. Parallel Architectures, Algorithms, and Optimizations Laboratory. Outline. Motivation: Biopolymers

chongp
Download Presentation

Exploring Folding Landscapes with Motion Planning Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring Folding Landscapes with Motion Planning Techniques Bonnie Kirkpatrick Montana State University Dr. Nancy Amato Guang Song Xinyu Tang Texas A&M University Parallel Architectures, Algorithms, and Optimizations Laboratory

  2. Outline • Motivation: Biopolymers • Goal: Folding Landscapes • Method: Motion Planning • Application 1: Protein Folding • Application 2: RNA Folding

  3. Motivation: Biopolymers

  4. Protein and RNA Molecules • Protein and RNA molecules are a complex 3-dimensional folding of a sequence of bases. • Primary Structure • Sequence of bases • Each base is represented by a letter of the alphabet • i.e. ACGUGCCAUCG • Obtained from experiment • Tertiary Structure • The sequence loops back on itself and folds in 3-dimensions. Tertiary representation of an RNA molecule.

  5. Tertiary Structure • Chemical bonds (or contacts) form between complementary bases in close proximity. • There are many possible conformationsof the primary sequence. • Example sequence: CACAGAGUGU • Two possible conformations are shown. • Potential energy calculations based on number and types of bonds are used to classify conformations. • The lowest energy conformation is known as the native structure. • Conformations with few bonds and high energy are referred to as unfolded. Two possible conformations of the sequence. Bonds are blue.

  6. Goal: Folding Landscapes

  7. The Folding Process (The Black Box) Unfolded Conformation (high energy) AGGCUACUGGGAGCCUUCUCCCC Physical Laws cause folding Native Conformation (low energy)

  8. Folding Landscapes • Description of the “black box” • A space in which every point corresponds to a conformation (or set of conformations) and its associated potential energy value (C-space). • A complete folding landscape contains a point for every possible conformation of a given sequence. Conformation Space Potential Native State Tetrahymena Ribozyme Landscape [Russell, Zhuang, Babcock, Millett, Doniach, Chu, and Herschlag, 2002]

  9. Folding Landscapes (cont.) • Conformational changesdescribe how a molecule changes physically to fold from one conformation to another • Continuous • Protein Folding Model • Bond angles change with continuous rotations • Discrete • RNA Folding Model • Bonds either exist or do not exist

  10. Features of Folding Landscapes • Folding pathways consist of the set of conformational changes a molecule is likely to fold though when moving from one conformation to another. • N to X to Y • Energy barriers are areas of the landscape with high energy that separate groups of conformations. • Y is separated from X and N • Intermediate states are conformations lying on the folding pathway represent local minimums of potential. • Y and X Native State Mutant α mRNA fragment [Chen and Dill, 2000]

  11. unfolded folded A Protein Folding Pathway

  12. A RNA Folding Pathway Unfolded Energy Barrier Native State Phenylalanine tRNA [Hofacker, 1998]

  13. Mapping Folding Landscapes • Existing techniques for mapping landscapes are limited to relatively short sequences (~200 nucleotides). • A robotics motion planning technique called PRM has successfully been applied to protein folding.

  14. Method: Motion Planning

  15. (Basic) Motion Planning (in a nutshell): Given amovableobject, find asequence of valid configurationsthat moves the object from the start to the goal. start goal obstacles Motion Planning Motion Planning for Foldable Objects: Given a foldable object, find a valid folding sequence that transforms the object from one folded state to another.

  16. [Kavraki, Svestka, and Latombe, 1996] A conformation Probabilistic Roadmap Method (PRM) Conformation space Construct the roadmap: 1. Generate nodes. 2. Connect to form roadmap The Roadmap is like a net being laid down on protein’s potential landscape. Potential • Now the roadmap can be used: • To find a path • To extract multiple paths Native state

  17. Application 1: Protein Folding

  18. Outline • Probabilistic Roadmap Methods (PRMs) for Protein Folding • the native fold is known • bias sampling around native fold • Results • Protein folding landscapes • Secondary structure formation order and validation • timed contact map • Folding kinetics

  19. [Song and Amato, 2001] • amino acid: pair of phi/psi angles • protein: a sequence of amino acids. • conformation node is: Model of a protein

  20. PRM: Node Generation N • Take advantage of the known native state. • map the potential landscape/funnel leading to it. • sample around it and gradually grow out. • generate conformations by randomly selecting phi/psi angles • Criterion for accepting a node: Compute potential energy E of each node and retain it with probability P(E):

  21. Native state PRM: Node Generation • Start with native structure. • Gradually grow out. Denser distribution around native state

  22. Find k closest nodes for each roadmap node • Assign edge weight to reflect energetic feasibility: • lower weight  more feasible • [Singh, Latombe, Brutlag, 1999] Native state PRM: Roadmap Connection

  23. where Energy Computation • Potential (ref. Levitt’83) • van der Waals + hydrogen bonds + disulphide bonds + hydrophobic effect • All-atom model • Free Energy (ref. Fiebig & Dill ’93, Munoz & Eaton’99)

  24. PRMs for Protein Folding: Key Issues • Validation • In RECOMB ‘01 (Song & Amato), our results validated with hydrogen exchange experiments. [Li & Woodward 1999] • Energy Functions • The degree to which the roadmap accurately reflects folding landscape depends on the quality of energy calculation.

  25. Analysis of Landscape • Folding Potential Landscape • Secondary structure formation order • timed contact map • experimental validation • Studying Folding Kinetics • 2-state folding kinetics • calculation of folding rates • identifying 2-state, 3-state, … k-state kinetics

  26. Distributions for different types:Potential Energy vs. RMSD for roadmap nodes all alpha alpha + beta all beta

  27. protein G (domain B1) 135 142 (IV:  1-4)   1-2 1-4 140 143 114 140 143 140 141 142 144 139 143 143 131  3-4 Timed Contact Map:formation order for a Path residue # residue #  Formation order: ,  3-4,  1-2,  1-4 Average T = 142

  28. Protein GB1(56 amino acids) • 1 alpha helix & 4 beta-strands Hydrogen Exchange Results first helix, and beta 3-4 Our Paths 80%: helix, beta 3-4, beta 1-2, beta 1-4 20%: helix, beta 1-2, beta 3-4, beta 1-4 Validating Folding Pathways [Li & Woodward 1999] • our paths are: from all the nodes with little structure to the native state • secondary structure formation order checked on each path w/ timed contact map

  29. Secondary structure formation order and validation • Proteins primarily from [Munoz & Eaton PNAS’99] for comparison purposes • Contact us if you want us to analyze your proteins!

  30. F U N R Folding kinetics:statistical mechanical model • Proteins treated as statistical system • Define interactions => partition function • Free energy as a function of reaction coordinate (R) • Then decide folding kinetics and folding rate • [Munoz & Eaton, Alm & Baker. PNAS’99] • Assumption and limitation of statistical model • Very limited interactions to simplify partition function calculation • Assume selected reaction coordinate good (monotonically increasing) • Cannot provide folding trajectories • Strength: as a theoretical model, it is good for analysis

  31. Free Energy Native Contacts Free Energy Landscape:2-state folding kinetics • Our method can produce similar results (plus more). • 2-state folding kinetics indicated • Both plots ‘imply’ nativelikeness should always increase • Plots like these lose info because of averaging effect our roadmap model statistical model [Munoz & Eaton PNAS’99] Blue, red and green lines are three levels of approximation Blue line: free energy average

  32. cluster paths into several groups extract 2-state,3-state, …, k-state kinetics from same roadmap not possible with statistical mechanical models A B A B Average 3-state 2-state 2-state Studying folding kinetics at pathway level

  33. Protein G Native contacts Free energy Studying folding kinetics at pathway level • trajectories not available from statistical model • native contact not monotonically increasing • Diverse free energy profiles

  34. Protein Folding:Conclusion & Future Work • PRM roadmaps approximate folding landscapes • Efficiently produce multiple folding pathways • Secondary structure formation order • better than trajectory-based simulation methods, such as Monte Carlo, molecular dynamics • Provide a good way to study folding kinetics • multiple folding kinetics in same landscape (roadmap) • natural way to study the statistical behavior of folding • more realistic than statistical models (e.g. Lattice models, Baker’s model PNAS’99, Munoz’s model, PNAS’99)

  35. Application 2: RNA Folding

  36. Outline • RNA Model using secondary structure • Conformation Space • Node Generation • Node Evaluation • Node Connection • Edge Weights

  37. RNA Secondary Structure • Two-dimensional representation of the tertiary structure • Planar representation • Sufficient structural information • Pseudo knots are considered a tertiary structure, rather than a secondary structure

  38. Violates criteria (1) Violates criteria (2) Secondary Structure Formalized • A secondary structure conformation is specified by a set of intra-chain contacts (bonds between base pairs) that follow certain rules. • Given any two intra-chain contacts [i, j] with i < j and [k,l] with k < l, then: • If i = k, then j = l • Each base can appear in only one contact pair • If k < j, then i < k < l < j • No pseudo-knots

  39. C-space Where n is the number of possible contact pairs PRM: Conformation Space • Let U be the set of every possible combination of contact pairs. • Let C-space (the conformation space), C, be the sub-set of U containing only valid secondary structures. • C-space is smaller than U, but is still very large. • Sequence: (ACGU)10 • Length: 40 nucleotides • C-Space: 1.6x108 structures • Purpose: generate nodes in C-space that describe the space without covering it

  40. C-space PRM: Node Generation • Random Node Generation Algorithm • Starting with an empty configuration, c, random contacts are added to c one at a time. • Each step preserves the condition that c contains a valid set of base pair contacts. • Contacts are added until there are no more contacts that do not conflict with the contact set of c. • Every node generated has valid secondary structure and is a member of C-space. • Since every generated node has the maximal number of contacts, the sampling is biased toward the area of C-space near the native state.

  41. PRM: Node Evaluation • Evaluation of Nodes • Potential energy determines how good a node is. • Only add a node to the roadmap if it has a low energy. • Probability of a node q being added to the roadmap:

  42. C1 = S1 S2 Sn-1 Sn = C2 PRM: Node Connection • Given two nodes in C-Space, C1 and C2, find a path between them consisting of a sequence of nodes: { C1 = S1, S2, …, Sn-1, Sn = C2 } • The path must have the property that for each i, 1 < i<n, the set of contact pair of Si differs from that of Si-1 by the application of one transformation operation: (1) open or (2) close a single contact pair.

  43. Node Connection (cont.) • There exists a path between any two nodes in C-Space. • Not just any path will do; we want a good one. • Bad paths have high energy nodes in them. • How do we find the lowest energy path?

  44. Node Connection (cont.) • more contacts  less potential energy • Heuristic: if a contact is opened by the transition from one node to another, try to close a contact in the next transition c1 = s1: ..(.((..))).. open s2: ..(.(....)).. close s3: ..(.((.).)).. open s4: ..(..(.)..).. close C2 = s5: ..(.((.)).)..

  45. Edge Weight • Depends on the nodes generated in the node connection phase. • Difference in potential energy • ΔEi = E(si+1) – E(si)

  46. Future Work • Analysis of the roadmap • Finding the low energy folding pathways • Shortest path algorithm • Validation • How do we know if our results agree with experimental results? • Proposal • Compare a fully enumerated roadmap to experimental folding rates • Compare a more sparse roadmap with a fully enumerated roadmap • Proposal • Solve the master equation using stacking pairs (they are representative of all the dynamics) and our model • Compare our results with results from the Zhang and Chen’s statistical mechanical model [2002]

  47. References Shi-Jie Chen and Ken A. Dill. Rna folding energy landscapes. PNAS, 97:646-651, 2000. Ivo L. Hofacker. Rna secondary structures: A tractable model of biopolymer folding. J.Theor.Biol., 212:35-46, 1998. Ivo L. Hofacker Jan Cupal and Peter F. Stadler. Dynamic programming algorithm for the density of states of rna secondry structures. Computer Science and Biology 96, 96:184-186, 1996. L. Kavraki, P. Svestka, J. C. Latombe, and M. Overmars. Probabilistic roadmaps for path planning in high-dimensional conguration spaces. IEEE Trans. Robot. Automat., 12(4):566-580, August 1996. J. C. Latombe. Robot Motion Planning. Kluwer Academic Publishers, Boston, MA, 1991. R. Li and C. Woodward. The hydrogen exchange core and protein folding. Protein Sci., 8:1571-1591, 1999. John S. McCaskill. The equilibrium partition function and base pair binding probabilities for rna secondary structure. Biopolymers, 29:1105-1119, 1990. Ruth Nussinov, George Piecznik, Jerrold R. Griggs, and Danel J. Kleitman. Algorithms for loop matching. SIAM J. Appl. Math., 35:68-82, 1972. R. Russell, X. Zhuang, H. Babcock, I. Millet, S. Doniach, S. Chu, and D. Herschlag. Exploring the folding landscape of a structured RNA. Proc. Natl. Acad. Sci., 99:155-60., 2002. Proc. Natl. Acad. Sci. U.S.A. 99, 155-60. D. Sanko and J.B. Kruskal. Time warps, string edits and macromolecules: the theory and practice of sequence comparison. Addison Wesley, London, 1983. A.P. Singh, J.C. Latombe, and D.L. Brutlag. A motion planning approach to exible ligand binding. In 7th Int. Conf. on Intelligent Systems for Molecular Biology (ISMB), pages 252-261, 1999. G. Song and N. M. Amato. Using motion planning to study protein folding pathways. In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pages 287-296, 2001. Stefan Wuchty. Suboptimal secondary structures of rna. Master Thesis, 1998.

More Related