1 / 20

Stabbing balls and simplifying proteins

This research paper discusses the problem of simplifying proteins for various computational applications, such as visualization, matching, and searching in protein databases. It introduces a new algorithm for line segment-based simplification of proteins, with better performance than previous solutions. Experimental results demonstrate the effectiveness of the proposed algorithm on protein chains with thousands of atoms.

lyoakum
Download Presentation

Stabbing balls and simplifying proteins

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stabbing balls and simplifying proteins Ovidiu Daescu and Jun Luo Department of Computer Science University of Texas at Dallas Richardson, TX 75080

  2. Problem definition • Input: indexed sequence of balls B= {B1, B2, …, Bn} in R3, with each Bi specified by a center and radius pair (pi,εi). • Let C= {p1, p2, …, pn}, be the set of center points. • Find set of stablers defined by a subset P= {pi1, pi2, …, pim} of C such that: • i1=1, im=n, and ijє{1,2,…,n}, for j= 1,2,…,m, • ij < ij+1, for j=1,2,…,m-1, • The line segment pijpij+1 (or the line pijpij+1 ) stabs each of the balls {Bij, Bij+1, …, Bij+1}, • There is no other subset P’ of C satisfying the first three conditions and of smaller size than P, i.e., m isminimized. Ball Bi εi pi Stabler

  3. Applications • Simplification of proteins for visualization, manipulation, (approximate) matching and searching in protein database, and neural map representation. • The problem is a generalization of the polygonal chain simplification problem. pi ε Approximating segment

  4. Key difference from chain simplification: ε Chain simplification εi Our simplification

  5. Sergey Bereg, Cylindrical Hierarchy for Deforming Necklaces, International Journal of Computational Geometry & Applications, 14(1-2): 3-18, 2004 Compute optimal cylindrical cover of a necklace with n beads (balls) in R3 in polynomial time. The n balls are ordered in sequence; if not, the problem is NP-hard. Related Works

  6. Related Works • Binhai Zhu, Approximating 3D points with Cylindrical Segments, International Journal of Computational Geometry & Applications, 14(3),189-201,2004. • Given a set S of n points in R3, compute k cylindrical segments enclosing S such that the sum of their radii is minimized. • For unordered points: NP hard. • Polynomial time approximation scheme (PTAS) for any fixed k>1 is possible. • Used for constructing neural maps and some other computational biology applications.

  7. Related Works • Frederic Vivien, Nicolas Wicker, Minimal Enclosing Parallelepiped in R3, CG:T&A, 29(2004), 177-190. • Find min. volume parallelepiped enclosing a set of n points. • O(n6) time.

  8. Our results • Quadratic or near quadratic time solutions for line segment stablers. • Subcubic, O(n2.4logO(1)n) time for line stablers • Experimental results: • for proteins with thousands of atoms, our solutions have much better performance than previous solutions; • actual running time is much smaller than the worst case time shows.

  9. Lp metric: distance between two points X and Y = For example, X=(x1,y1), Y = (x2,y2) L1= |x1-x2| + |y1-y2| L2= L∞=|x1-x2| if |x1-x2| ≥ |y1-y2| or |y1-y2| if |x1-x2| < |y1-y2| Y 5 3 L1=7 L2=5 L∞=4 X 4 Line segment based simplification , used for protein simplification

  10. Line segment based simplification • L2 metric: O(n2logn) time, O(n2) space algorithm. • similar to the polygonal chain simplification algorithm of Daescu et.al. • replace the line segment pipj by two rays; for each ray, intersect it and the projections from pi (or pj) of the balls {Bi+1,Bi+2,…,Bj} with a plane. • reduces to deciding whether the projection of pi along the ray is within the common intersection of some disks (the projected balls).

  11. Line segment based simplification • L1 or L∞ metric: O(n2) time, O(n) space. • L1 (L∞)“balls” are cubes (crosspolytopes). • Main idea: the common intersection of the projections of n L1 or L∞ “balls”, from any view point onto any plane, if not empty, is a convex region bound by O(1) edges.

  12. Line segment based simplification • O(n2) time and space if each Bi is a convex polytope and the complexities of the projections Proj(Bi, W, p) of the Bi’s from any point p onto any plane W satisfy the condition: • The intersection of the projection Proj(Bi, W, p) of the Bi’s from any point p onto any plane W is a convex polygon of size O(n). • The algorithm is similar to the one for the L2 metric.

  13. Line based simplification • For n indexed points P = {p1, p2, …, pn} in Rd, d≥ 3, with O(n3-3/(└f(d)/2┘+1) *logO(1)n) time and space one can report for each line pipj, 1≤i<j≤n, the farthest point pk with i<k<j • f(d) = O(d2). • For protein chains, the radius of each ball Bi takes value from a small set, and f(d) = 32 = 9 • the minimum size set P of stablers can be found with O(n2.4logO(1)n) time and space.

  14. Main idea: • Use a (constant number of) balanced binary tree structure. • At each node, construct a farthest-point-from-line data structure, balanced with respect to the number of queries. • O(n2) queries overall.

  15. Experimental Results • Use RMSD to measure the similarity between the original and the simplified chains. • Different number of atoms in the original and the simplified chains.

  16. Experimental Results 1CA2: 256 alpha carbons Simplified 1CA2: 168 alpha carbons RMSD= 0.62 Å

  17. Experimental Results 1DDZ_A: 481 alpha carbons Simplified 1DDZ_A : 340 alpha carbons RMSD= 0.44 Å

  18. Experimental Results 1DDZ_B: 481 alpha carbons Simplified 1DDZ_B : 351 alpha carbons RMSD= 0.43 Å

  19. Conclusions • The RMSDs are very small, the simplified backbones are similar to the originals, while having significantly simpler representations (e.g., about 33% reduction in size for 1CA2). • The simplified chains can be used in place of the original ones in visualization, alignment, classification of protein structures, etc.

More Related