1 / 48

Conformational Space

Conformational Space. Conformational Space. Conformation of a molecule: specification of the relative positions of all atoms in 3D-space, Typical parameterizations : List of coordinates of atom centers List of torsional angles (e.g., the f - y - c for a protein)

kelton
Download Presentation

Conformational Space

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Conformational Space

  2. Conformational Space • Conformation of a molecule: specification of the relative positions of all atoms in 3D-space, • Typical parameterizations: • List of coordinates of atom centers • List of torsional angles (e.g., the f-y-cfor a protein) • Conformational space:Space of all conformations

  3. qj qi qN-1 q2 qN q1 Conformational Space

  4. q q q q q 1 3 0 n 4 Conformational Space

  5. q 2 q q q q q t(t) 1 n 0 4 3 Relation to Robotics/Graphics Configuration space

  6. Need for a Metric • Simulation and sampling techniques can produce millions of conformations • Which conformations are similar? • Which ones are close to the folded one? • Do some conformations form small clusters (e.g. key intermediates while folding)?

  7. Metric in Conformational Space • A metric over conformational space C is a function: d: c,c’  C  d(c,c’) +{0}such that: • d(c,c’) = 0  c = c’ (non-degeneracy) • d(c,c’) = d(c’,c) (symmetry) • d(c,c’) + d(c’,c”)  d(c,c”) (triangle inequality)

  8. But not all metrics are “good” • Euclidean metric: d(c,c’) = Si=1,...,n(|fi-fi’|2+ |yi-yi’|2)

  9. Metric in Conformational Space • A “good” metric should measure how well the atoms in two conformations can be aligned • Usual metrics: cRMSD, dRMSD

  10. RMSD • Given two sets of n points in 3 A = {a1,…,an} and B = {b1,…,bn} • The RMSD between A and B is:RMSD(A,B) = [(1/n)Si=1,…,n||ai-bi||2]1/2 where ||ai-bi|| denotes the Euclidean distance between ai and bi in 3 • RMSD(A,B) = 0 iff ai = bi for all i

  11. cRMSD • Molecule M with n atoms a1,…,an • Two conformations c and c’ of M • ai(c) is position of ai when M is at c • cRMSD(c,c’) is the minimized RMSD between the two sets of atom centers: minT[(1/n)Si=1,…,n||ai(c)– T(ai(c’))||2]1/2 where the minimization is over all possible rigid-body transform T

  12. cRMSD • cRMSD verifies triangle inequality • cRMSD takes linear time to compute • Often, cRMSD is restricted to a subset of atoms, e.g., the Ca atoms on a protein’s backbone

  13. Representation Restricted to Ca Atoms - The positions of AA residue centers (Cα atoms) mainly determine the structure of a protein. - In structural comparison, people usually work only on the backbone of Cα atoms, and neglect the other atoms. Protein 1tph

  14. Possible project: Design a method for efficiently finding nearest neighbors in a sampled conformation space of a protein, using the cRMSD metric.

  15. dRMSD • Molecule M with n atoms a1,…,an • Two conformations c and c’ of M • {dij(c)}: nn symmetrical intra-molecular distance matrix in M at c • dRMD(c, c’) is :[(1/n(n-1))Si=1,…,n-1Sj=i+1,…,n(dij(c)– dij(c’))2]1/2 • {dij} is usually restricted to a subset of atoms, e.g., the Ca atoms on a protein’s backbone

  16. Intra-Molecular Distance Matrix Distances between Ca pairs of a protein with 142 residues. Darker squares represent shorter distances.

  17. 45 40 85 1 Intra-Molecular Distance Matrix Distances between Ca pairs of a protein with 142 residues. Darker squares represent shorter distances.

  18. Intra-Molecular Distance Matrix

  19. dRMSD • Molecule M with n atoms a1,…,an • Two conformations c and c’ of M • {dij(c)}: nn symmetrical intra-molecular distance matrix in M at c • dRMSD(c, c’) =[(2/n(n-1))Si=1,…,n-1Sj=i+1,…,n(dij(c)– dij(c’))2]1/2 • {dij} is usually restricted to a subset of atoms, e.g., the Ca atoms on a protein’s backbone

  20. dRMSD • Molecule M with n atoms a1,…,an • Two conformations c and c’ of M • {dij(c)}: nn symmetrical intra-molecular distance matrix in M at c • dRMSD(c, c’) =[(2/n(n-1))Si=1,…,n-1Sj=i+1,…,n(dij(c)– dij(c’))2]1/2 • {dij} is usually restricted to a subset of atoms, e.g., the Ca atoms on a protein’s backbone • Advantage: No aligning transform • Drawback: Takes quadratic time to compute

  21. Is dRMSD a metric? • dRMSD(c, c’) = [(2/n(n-1))Si=1,…,n-1Sj=i+1,…,n(dij(c)– dij(c’))2]1/2 is a metric in the n(n-1)/2-dimensional space, where a conformation c is represented by {dij(c)} • But, in this representation, the same point represents both a conformation and its mirror image

  22. k-Nearest-Neighbors Problem Given a set S of conformations of a protein and a query conformation c, find the k conformations in S most similar to c(w.r.t. cRMSD, dRMSD, other metric) Can be done in time O(N(log k + L)) where: -N = size of S- L = time to compare two conformations

  23. k-Nearest-Neighbors Problem The total time needed to compute the k nearest neighbors of every conformation in S is O(N2(log k + L))Much too long for large datasets where N ranges from 10,000’s to millions!!! Can be improved by: 1. Reducing L 2. More efficient algorithm (e.g., kd-tree)

  24. kd-Tree In a d-dimensional space, where d>2, range searching for a point takes O(dn1-1/d)

  25. k-Nearest-Neighbors Problem Idea: simplify protein’s description

  26. Assume that each conformation is described by the coordinates of the n Ca atoms cRMSD  O(n) time dRMSD  O(n2) time

  27. ci cj This representation is highly redundant • Proximity along the chain entails spatial proximity • Atoms can’t bunch up, hence far away atoms along the chain are on average spatially distant

  28.  m-Averaged Approximation • Cut the backbone into fragments of m Ca atoms • Replace each fragment by the centroid of the m Ca atoms •  Simplified cRMSD and dRMSD 3n coordinates 3n/mcoordinates

  29. Evaluation: Test Sets[Lotan and Schwarzer, 2003] • 8 diverse proteins (54 -76 residues) • Decoy setsof N =10,000 conformations from the Park-Levitt set [Park et al, 1997] Correlation: Higher correlation for random sets ( greater savings)

  30. Running Times

  31. Further Reduction for dRMSD • Stack m-averaged distance matrices as vectors of a matrix A

  32. N A r Vector ai of elements of distance matrix of ith conformation (i = 1 to N)

  33. Further Reduction for dRMSD • Stack m-averaged distance matrices as vectors of a matrix A • Compute the SVD A = UDVT

  34. Diagonal matrix Orthonormal(rotation) matrix SVD Decomposition N A(rxN) U(rxr) D(rxr) VT(rxN) = r Vector aj of elements of distance matrix of jth conformation (j = 1 to N)

  35. SVD Decomposition N s1 s2 sr A(rxN) U(rxr) VT(rxN) 0 = r 0 Vector aj of elements of distance matrix of jth conformation (j = 1 to N) Diagonal matrix s1 s2  ...  sr  0(singular values) Orthonormal(rotation) matrix

  36. vjT vkT vi and vj are orthogonal unit Nx1 vectors SVD Decomposition N A(rxN) U(rxr) D(rxr) VT(rxN) = r Vector aj of elements of distance matrix of jth conformation (j = 1 to N) Diagonal matrix Matrix withorthonormal rows Orthonormal(rotation) matrix

  37. y Representation ofA in space (X,Y) X does not depend on thecoordinate system! Y r-dimensional space x SVD Decomposition N A(rxN) U(rxr) D(rxr) VT(rxN) = r

  38. s1 s2 s3 sr v1T v2T SVD Decomposition N A(rxN) U(rxr) D(rxr) VT(rxN) = r ||s1v1||  ||s2v2|| ...

  39. s1 s2 s3 sr v1T v2T SVD Decomposition N A(rxN) U(rxr) D(rxr) VT(rxN) = r p principal components vpT

  40. SVD Decomposition N A(rxN) U(rxr) D(rxr) VT(rxN) = r s1 s2 sp v1T v2T p principal components vpT 0

  41. Further Reduction for dRMSD • Stack m-averaged distance matrices as vectors of a matrix A • Compute the SVD A = UDVT • Project onto p principal components

  42. between dRMSD and is reduced to summing up 12 to 20 terms(instead of ~ 80 to 200, since the proteins have 54 to 76 amino acids) Correlation

  43. Complexity of SVD • SVD of rxN matrix, where N > r, takes O(r2N) time • Here r ~ (n/m)2 • So, time complexity is O(n4N) • Would be too costly without m-averaging

  44. Evaluation for 1CTF Decoy Sets[Lotan and Schwarzer, 2003] • N = 100,000, k = 100, 4-averaging, 16 PCs • 70% correct, with furthest NN off by 20% • Brute-force: 84 h • Brute-force + m-averaging: 4.8 h • Brute-force + m-averaging + PC: 41 min • kD-tree + m-averaging + PC: 19 min • Speedup greater than x200 • 6k approximate NNs contain all true k NNs •  Use m-averaging and PC reduction as fast filters

More Related