1 / 41

Uğur Sezerman

Automatic Function Identification Using the Network Properties Obtained from Graph Representation of Proteins. Uğur Sezerman. MOTIVATION. Common biological function=similar 3D structures Comparison of graphs to find similar sub graphs

Download Presentation

Uğur Sezerman

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Function Identification Using the Network Properties Obtained from Graph Representation of Proteins Uğur Sezerman

  2. MOTIVATION • Common biological function=similar 3D structures • Comparison of graphs to find similar sub graphs • Discovering Native folds and differentiation from artificially generated proteins • Finding functional domains • Finding structural motifs for function

  3. Background Graph Matching Algorithms One isomorphism between them is f(a)=1, f(b)=6, f(c)=2, f(d)=4, f(e)=5, f(f)=3. * J. R. Ullmann, An Algorithm for Subgraph Isomorphism, Journal of the Association for Computing Machinery, vol. 23, pp. 31-42, 1976 ** D.C. Schmidt, L.E. Druffel, A Fast Backtracking Algorithm to Test Directed Graphs forIsomorphism Using Distance Matrices, Journal of the Association for ComputingMachinery, 23, pp. 433-445, 1976.

  4. INEXACT SUBGRAPH MATCHING Allow for : • Mismatching attribute values (mutations) • Missing nodes (amino acid deletions and/or insertions) • Missing links (contact changes due to conformational rearrangements) Also called error-correcting subgraph isomorphism NP-Complete

  5. Representation Methods of Graphs • Delaunay Tesellated graphs • Contact maps

  6. Delaunay simplex is defined by points, whose Voronoi polyhedra have common vertex. Delaunay simplex is always a triangle in a 2D space and a tetrahedron in a 3D space. (Voronoi polyhedra may have different # of faces and edges.) Voronoi Tessellation Delaunay Tessellation Voronoi/Delaunay Tessellation in 2D

  7. Delaunay Simplices* *Taylor T., Vaisman I.I.: Graph theoretic properties of networks formed by the Delaunay tessellation of protein structures. Phys. Rev. E. Stat. Nonlin. Soft. Matter Phys.73 (2006) 041925

  8. Contact Maps1,2 • Modelling protein structure as graph • N×N matrix S • distance between Cα atoms < 6.8 Ao3 • Si,j = 1 otherwise Si,j = 0 1. Vendruscolo, M., E. Kussel, and E. Domany: Recovery of Protein Structurefrom Contact Maps. Structure Fold. Des. 2 (1997) 295-306. 2. Fariselli, P. and R. Casadio: A Neural Network Based predictor of Residue Contacts in Proteins. Protein Eng. 9 (1996) 941-948. 3. A. R. Atilgan, P. Akan, C. Baysal: Small-World Communication of Residues and Significance for ProteinDynamics. Biophys. J. 86 (2004) 85-91

  9. Graph Theoretical Attributes • (k) Connectivity= # of neighbours • (C) Cliquishness= # of contacts between neighbours(d) / All possible contacts between them • S(k) Second Connectivity= sum of the connectivity values of all neighbours for a node.

  10. Centrality Measures d: Degree Matrixσ: Shortest Path Matrix

  11. Establishing Bases of Applications • Potential Use of Graph TheoreticalProperties of Protein Structures in Structural Alignment

  12. Network Properties in Structural Alignment • Calculated the difference between the network property values of the CE aligned residues of two protein structures. • Then checked to see whether such a difference could be obtained randomly.

  13. Structure Alignment Calculator, version 1.02, last modified: Jun 15, 2001. CE Algorithm, version 1.00, 1998. Chain 1: pdbdir/12AS.pdb:A (Size=330) Chain 2: pdbdir/1PYS.pdb:A (Size=350) Alignment length = 211 Rmsd = 3.45A Z-Score = 5.3 Gaps = 125(59.2%) CPU = 15s Sequence identities = 14.2% Chain 1: 9 QRQISFVKSHFSRQLEERLGLIEVQAPILSR Chain 2:100 LHPITLMERELVEIFRAL-GYQAVEGPEVES CE Alignment Table :Calculated parameter Values Table:Part of aCE Alignment result between the chain A of 12AS and the chain A of 1PYS. Calculated values for each graph theoretical property for the bold part is in Table 1 as an example.

  14. Randomness Check • Shuffling Method • Preserved the network values of the first protein and randomly shuffled the existing network values in the second protein. • Shifting Method • we basically shifted the network values of the second protein randomly while keeping the values of the first protein • These procedures are repeated 1000 times

  15. Data Sets • Caprioti * data Set: This data set contains structurally similiar proteins which have very low sequence similarity. • Astral 40 data set: 3064 pairs are randomly chosen from database of structural similar proteins with low sequence identity. * Capriotti,E., Fariselli,P., Rossi,I. and Casadio,R. ( (2004) ) A Shannon entropy-based filter detects high-quality profile-profile alignments in searches for remote homologues. Proteins, , 54, , 351–360.

  16. TABLE II The Results From Randomly Shuffled Method (Capriotti Dataset: 158 Pairs) TABLE III The Results From Shifted Method (Capriotti Dataset: 158 Pairs)

  17. TABLE IV The Results From Randomly Shuffled Method (Astral 40 Dataset: 3064 Pairs) TABLE VThe Results From Shifted Method (Astral 40 Dataset: 3064 Pairs)

  18. TABLE VI Z-Scores For Some Example Pairs From Randomly Shuffled Method (Astral 40 Dataset)

  19. TABLE VII Z-Scores For Some Example Pairs From Shifted Method (Astral 40 Dataset)

  20. Conclusion • 67 protein pairs can not be explained over 3064 protein pairs, because their structural similarities are also too low. TABLE IXThe best combination of the properties, the last column shows the amount of the non-explained pairs

  21. Application I: Structural Alignment Table 1. Graph Theoretical Properties • Global and Local Alignment of protein structures using graph theoretical properties. • We used nine different properties. (Table 1) • Affine gap penalty is used for alignment. • Distance Function:

  22. Comparison of Global Alignment Results with CE

  23. Comparison of Local Alignment Results with CE

  24. Application II • Finding functional domains • Functional similarity does not imply sequence similarity. • Two proteins with very low sequence similarity can have same function which shows importance of structure similarity.

  25. Selected Attributes • Degree • Clustering Coefficient • Secondary Structure Similarity • Sequence Similarity (Blossum 62)

  26. Data Set • Data set created by Capriotti et. al.(2004)* • This data set contains structurally similiar proteins which have very low sequence similiarity. • Chosen Globins family to extend results * Capriotti,E., Fariselli,P., Rossi,I. and Casadio,R. ( (2004) ) A Shannon entropy-based filter detects high-quality profile-profile alignments in searches for remote homologues. Proteins, , 54, , 351–360.

  27. Our Approach • Contact map graphs for proteins are built. • In our approach, we are using four dimensions. These are cliquishness, connectivity, sequence similarity and secondary structure. • PAM250 Matrix is used for sequence similarity. • The secondary structure similiarity score is calculatedby a similiarity matrixclaimedby Wallqvist et. al.* • if cliquishness, connectivity and second connectivity values are close according to intervals we specified, the matchis awarded else, the match is penalized. *Wallqvist A, Fukunishi Y, Murphy LR, Fadel A, Levy RM. Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases. Bioinformatics. 2000 Nov;16(11):988-1002.

  28. Our Approach • PDB files are parsed and correlation coefficient, degree values are calculated for each residue. • Those values with binding information are put into a matrix which is called “Binding residue matrix” • The initial nodes are chosen among the most heavily connected nodes. • Binding residue matrix and an initial node are sent to each processor to begin its operation.

  29. Results-Globins- Self Match I

  30. Results-Globins- Self Match II

  31. Self Matching 24 Pairs of Domains

  32. Questions • Thank you • ugur@sabanciuniv.edu

  33. Results-Globins- Self Match IV

  34. Results-Globins-Sub Cross Match

  35. Results (Globins Gen. I) * Different parameters were used to extend the results.

  36. Results (Globins Gen. II) * Different parameters were used to extend the results.

  37. Dataset* I *Dataset was created by Capriotti et. al.(2004)

  38. Dataset* II *Dataset was created by Capriotti et. al.(2004)

  39. Dataset* III *Dataset was created by Capriotti et. al.(2004)

  40. Dataset* IV *Dataset was created by Capriotti et. al.(2004)

  41. Dataset* V *Dataset was created by Capriotti et. al.(2004)

More Related