1 / 38

Structure superposition

Structure superposition. Structure superposition ≠ Structure alignment. Lecture 11. Chapter 16, Du and Bourne “Structural Bioinformatics”. Why?. Study the conformational changes of the same protein with or without ligands -- Same protein sequences

kenley
Download Presentation

Structure superposition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structure superposition Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

  2. Why? Study the conformational changes of the same protein with or without ligands -- Same protein sequences Study the effect of mutations on protein structure -- Highly similar protein sequences Assessment of protein structure prediction. -- How accurate is the predicted models? -- Same protein sequences Remote homolog detection. Structures generally are preserved better than sequences over the course of evolution. e.g. myoglobin and -hemoglobin are homologous and have similar structures, but the sequence identity can be as low as 8.5%! Classification of protein folds

  3. Why? Structure conservation > sequence conservation • Structures may align well even if there sequence similarity is low. • For example, an optimal superposition of myoglobinand beta-hemoglobin, which are structural neighbors. • However, their sequence identity is only 8.5%!

  4. Why? Structure conservation > sequence conservation Receiver Operating Characteristic Chothia and Lesk True positive rate (%) False positive rate (%)

  5. ASIDE: Making sense of a ROC curve Receiver Operating Characteristic • ROC experiment: • For each pair P of proteins in dataset, perform alignment and record score: S(P) • Rank all pairs according to their scores, from highest to lowest. • Scan ranked pairs, and record rate of true positives and true negatives. True positive rate (%) False positive rate (%)

  6. ASIDE: Making sense of a ROC curve Prediction Benchmark 1.00 Yes 0.99 Yes 0.98 Yes 0.97 Yes 0.96 No 0.95 No 0.93 Yes 0.91 Yes 0.89 No 0.87 No 0.85 No 0.83 No 0.83 Yes 0.81 No 0.77 No 0.74 No 0.73 No 0.70 No 0.69 No 0.67 Yes 0.62 No 0.56 No 0.54 No 0.53 No (%) (%)

  7. Alignment vs. Superposition • Structural alignment attempts to establish homology between two or more polymer structures based on their shape and 3D structure. • Structural alignment requires no a priori knowledge of equivalent positions. • Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. • Conversely, simple structural superpositionuses knowledge of at least some equivalent residues to guide a rigid body superposition. • The most basic possible comparison between protein structures makes no attempt to align the input structures. • Requires a precalculated alignment as input to determine which of the residues in the sequence are intended to be considered in the RMSD calculation.

  8. Structural superposition of two CheY orthologs In pairwise structure superposition, a correspondence set of residue pairs is established by a pairwise sequence alignment.

  9. Pairwise structure superposition • Superposition algorithms optimize the orientation and spatial position of the two molecules with respect to each other. • Superposition usually starts with a sequence comparison, which establishes the one-to-one relationships between pairs of atoms from which the RMSD is computed. • This is typicallya good assumption at appreciable pairwise sequence identity, but breaks down in the Twilight Zone. • Once atom-to-atom relationships between two structures are established, the task of the algorithm is to achieve an optimal superposition with the smallest possible RMSD. It is usually impossible to achieve perfect overlap of all atoms pairs even for structures with 100% identical sequence. • Overlaying one pair of atoms perfectly may push another pair of atoms further apart. • Also, as in sequence alignment, there is a friction between global vs. local matching that must be considered.

  10. Global similarity ≠ local similarity Global alignment Images and content from Patrice Koehl at UCDavis

  11. Global similarity ≠ local similarity Structural motif Local alignment Images and content from Patrice Koehl at UCDavis

  12. Choosing an appropriate description of structure • Structure comparisons can be done at several different levels • Individual atoms • --disadvantages? • Residue positions, which can be specified by the coordinates of C, C, and the center of mass of the side-chains • What are advantages and disadvantages of using different residue representations? • Small fragments • Secondary structure elements (SSE)

  13. Choosing an appropriate description of structure • Only when the structures to be aligned are highly similar or even identical is it meaningful to align side-chain atom positions. • -- In which case the RMSD reflects not only the conformation of the protein backbone but also the rotameric states of the side chains. • Other comparison criteria that reduce noise and bolster positive matches include: • -- Secondary structure assignment • -- Native contact maps or residue interaction patterns • -- Measures of side chain packing • -- Measures of hydrogen bond retention

  14. Contact map

  15. Choosing an object function to extremize Structure superposition requires minimizing the error within the framework of some object function. Which one? • Torsion angle comparison • Distance matrices • Structure superposition (RMSD, TM-score, etc.)  Most obvious & common • Secondary structure superposition (SHEBA) This decision must also be made for structure alignment since superposition is used (many times over) in the harder problem.

  16. Torsion angles Torsion angles (f,y) are: - local by nature - invariant upon rotation and translation of the molecule - compact (O(n) angles for a protein of n residues) But… Add 1 degree To all f, y Images and content from Patrice Koehl at UCDavis

  17. Distance matrices 5.9 2 4 • Advantages - invariant with respect to rotation and translation - can be used to compare proteins • Disadvantages - the distance matrix is O(n2) for a protein with n residues - comparing distance matrices is a difficult problem - insensitive to chirality 8.1 3 6.0 1 Images and content from Patrice Koehl at UCDavis

  18. Scoring DM similarity (or in this case, contact map)

  19. Introduce a gap Scoring DM similarity (or in this case, contact map) In superposition, gap location is defined by an alignment! In alignment, different gap positions are tried till the best overlap is identified.

  20. Root mean squared deviation (RMSD) • The most common parameter that expresses the difference between two protein structures is RMSD, or root mean squared deviation (distance), in atomic positions between the two structures. • RMSD can be calculated as a function of all atoms or as a function of some subset of the atoms, such as the backbone or CA atoms. • Using a subset of the protein atoms is common because it is likely that, when two protein structures are compared, they will not be identical to each other in sequence, and therefore the only atoms between which one-to-one comparison in position can be made will be the backbone atoms.

  21. RMSD calculation 3 4 1 5 2 d1 d2 d3 d4 d5 1 2 3 4 5 The two structures must first be superimposed to calculate a meaningful RMSD value because they are currently in different coordinate systems !!!

  22. RMSD calculation (with a gap) Blue 1– 2 – 3 – 4 – 5 Red1 – 2 – x – 4 - 5 3 4 1 5 2 d1 d2 d4 d5 1 2 4 5

  23. RMSD vs. average D as a function of n Estimating RMSD by averaging distances generally gets better as the correspondence set size increases. However, RMSD must always be greater than <dis>.

  24. 3 4 1 5 2 1 2 3 4 5 Using RMSD to find the optimal superposition

  25. 3 3 3 3 3 3 4 4 4 4 4 4 1 1 1 1 1 1 5 5 5 5 5 5 2 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 Superposition is too complicated for manual optimization

  26. Simplified problem (compared to structure alignment): we know the correspondence between set A and set B. We wish to compute the rigid transformation T that best align a1 with b1, a2 with b2, …, aN with bN The error to minimize is defined above. Using RMSD to find the optimal superposition Old problem, solved in Statistics, Robotics, Medical Image Analysis, etc. Images and content from Patrice Koehl at UCDavis

  27. A rigid-body transformation T is a combination of a translation t and a rotation R, thus: T(x) = Rx + t. The quantity to be minimized is: The algorithm includes a fair amount of linear algebra (and a little bit of calculus) that is outside the scope of this class. Believe it or not, the algorithm is O(n)! Using RMSD to find the optimal superposition Images and content from Patrice Koehl at UCDavis Representation of 6 “trivial” DOF

  28. Using RMSD to find the optimal superposition Pseudocode: Superposition algorithm in reality 1.) Define error function (RMSD) 2.) Determine correspondence set (pairwise sequence alignment) 3.) Translation = align centers of mass (COM) 4.) Rotation = use matrix methods to solve for rotation that minimizes the error function (variety of methods available) 5.) Evaluate the resultant superposition 6.) Refine the superposition (b/c COM to COM may not be best translation) 7.) Iterate till convergence

  29. 4 4 4 5 5 5 1 1 2 2 1 1 2 2 3 3 4 4 5 5 1 2 6 6 6 3 3 3 Back to our toy model… 1.) Generate pairwise alignment 123 - 45 123456 2.) Find optimal superimposition - Translation - Rotation

  30. Superposition of a pair of CuZnSOD structures Sequence identity = 83% RMSD = 1.0 Å

  31. Superposition of a pair of CuZnSOD structures Sequence identity = 83% RMSD = 1.0 Å

  32. Superposition of several CuZnSOD structures <Sequence identity> = 68%  35% <RMSD> = 1.6 Å 0.6 Å

  33. Global vs. local superposition in Calmodulin Ligand free Complexed with trifluoperazine

  34. Global vs. local superposition in Calmodulin Local alignment: RMSD = 0.9 Å (62 residues) Global alignment: RMSD =15 Å (143 residues)

  35. By itself, RMSD is not a very useful error function RMSD = 0.0 Å Aligned = 40 Z-score = 3.7 RMSD = 0.0 Å Aligned = 95 Z-score = 17.3 RMSD = 0.0 Å Aligned = 101 Z-score = 18.4 For example, consider a series of fragments all generated from the blue structure…

  36. Up-weighting secondary structure, etc. Based on the assumption that that secondary structure elements should match-up better than coil, we can easily modify the RMSD calculation to reflect that. That is, a multiplier is applied (where x1 > x2) to up-weight the important stuff. For example, assuming the red dots correspond to secondary structures in the figure above, RMSD’ < RMSD, which might be expected to be a more accurate reflection of the similarity between the pair.

  37. Template Modeling Score (TM-score) • The TM-score is a measure of similarity between two protein structures with different tertiary structures, which is intended as a more accurate measure of the quality of full-length protein structures than the often used RMSD measures. • The TM-score indicates the difference between two structures by a score between (0,1], where 1 indicates a perfect match between two structures. • Generally scores below 0.20 corresponds to randomly chosen unrelated proteins whereas structures with a score higher than 0.5 assume roughly the same fold. • The TM-score is designed to be independent of protein lengths. do = Normalization factor di= Distance between i-th residue pair Lxxx= Lengths of target protein and alignment Y. Zhang, J. Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 2004 57: 702-710

  38. RMSD vs TM-score RMSD: 12.1Å TM-score:0.81 RMSD:12.5Å TM-score:0.22 Images from Dr. Zhang at KU

More Related