1 / 31

MSDfold (SSM) A web service for protein structure comparison and structure searches

MSDfold (SSM) A web service for protein structure comparison and structure searches Eugene Krissinel. http://www.ebi.ac.uk/msd-srv/ssm/ssmstart.html. Structure alignment. Structure alignment may be defined as identification of residues occupying “equivalent” geometrical positions.

Download Presentation

MSDfold (SSM) A web service for protein structure comparison and structure searches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MSDfold (SSM) A web service for protein structure comparison and structure searches Eugene Krissinel http://www.ebi.ac.uk/msd-srv/ssm/ssmstart.html

  2. Structure alignment Structure alignment may be defined as identification of residues occupying “equivalent” geometrical positions • Unlike in sequence alignment, residue type is neglected • Used for • measuring the structural similarity • protein classification and functional analysis • database searches

  3. Methods • Many methods are known: • Distance matrix alignment (DALI, Holm & Sander, EBI) • Vector alignment (VAST, Bryant et. al. NCBI) • Depth-first recursive search on SSEs (DEJAVU, Madsen & Kleywegt, Uppsala) • Combinatorial extension (CE, Shindyalov & Bourne, SDSC) • Dynamical programming on Ca (Gerstein & Levitt) • Dynamical programming on SSEs (SSA, Singh & Brutlag, Stanford University) • many other • SSM employs a 2-step procedure: • Initial structure alignment and superposition using SSE graph matching • Ca - alignment

  4. r2 a2 r1 a1 t L Graph representation of SSEs E. M. Mitchell et al. (1990) J. Mol. Biol. 212:151 SSE graphs differ from conventional chemical graphs only in that they are labelled by vectors of properties. In graph matching, the labels are compared with tolerances chosen empirically.

  5. H1 A B H1 H2 S1 S1 S4 S2 H2 H1 S3 S2 S1 S3 H4 S2 S4 H1 H5 S5 S2 S3 S6 S1 S4 S7 H2 H3 S7 H2 H3 H6 S6 H4 S3 H5 S4 S5 H6 SSE graph matching A Matching the SSE graphs yields a correspondence between secondary structure elements, that is, groups of residues. The correspondence may be used as initial guess for structure superposition and alignment of individual residues. B

  6. chain A matched helices matched strands chain B Ca - alignment • SSE-alignment is used as an initial guess for Ca-alignment • Ca-alignment is an iterative procedure based on the expansion of shortest contacts at best superposition of structures • Ca-alignment is a compromise between the alignment length Nalignand r.m.s.d. Longest contacts are unmapped in order to maximise the Q-score:

  7. Multiple structure alignment • More than 2 structures are aligned simultaneously • Multiple alignment is not equal to the set of all-to-all pairwise alignments • Helps to identify common structure motifs for a whole family of structures

  8. Iterative removal of non-aligning SSEs best pairwise alignments may be multiply aligned from pairwise relations Helices Strands do not multiply align, but one still can try to align them by probing alternative (not best) alignments C A B

  9. Iterative removal of non-aligning SSEs 4 alternative pairwise alignments make up to 4 multiple alignments: 1 A1 - B1 - C1 A1 - B2 - C1 A2 - B1 - C1 A2 - B2 - C1 1 1 2 C prohibitive for Complexity 2 A structures B

  10. Start Calculate all-to-all pairwise alignments Are there non-aligning SSEs? Remove one non-aligning SSE with lowest score Quit Iterative removal of non-aligning SSEs Heuristics: remove non-aligning SSE with lowest alignment score and reiterate all alignment 1 1 1 2 C 2 Yes No A B

  11. Multiple SSE alignment Initial C alignment Choose structure, closest to X, as central star  and align all the rest to  Superpose structures and calculate consensus structure X Score improved? Quit Multiple C refinement Central star & consensus  Yes No Unmap groups of atoms with highest distance score D in order to maximise the score C B A X

  12. Pairwise Alignment vs. Multiple Alignment Addition of 1MGW:A (close neighbour to 1SAR:A) spots out a common motif of -sheet and -helix Best pairwise alignment of 1SAR:A and 1D1F:B includes only -sheet

  13. SSM server map http://www.ebi.ac.uk/msd-srv/ssm

  14. SSM output • Table of matched Secondary Structure Elements • Table of matched backbone Ca-atoms with distances between them at best structure superposition • Rotation-translation matrix of best structure superposition • Visualisation in Jmol and Rasmol • r.m.s.d. of Ca-alignment • Length of Ca-alignment Nalign • Number of gaps in Ca-alignment • Quality score Q • Statistical significance scores P(S), Z • Sequence identity

  15. Statistical significance of alignments x1 • P-value is estimated using Q-scores of SSE deviations • P(S) is the probability of getting a score equal to S or higher at random picking structures from the PDB xi xn • P(S) is calibrated on SCOP folds • P(S) is often expressed through Z-score

  16. Scoring at low structural similarity - 1KNO:A vs SCOP 1.61 Maximal Q-score d1di2a_ (69 res) Q-score 0.213 RMSD 2.43 Nalign 67/184 P 0.55 Lowest RMSD d1emn_1 (43 res) Q-score 0.019 RMSD 0.9 Nalign13/184 P 0.075 Highest Nalign d1elxb_ (449 res) Q-score 0.02 RMSD 5.82 Nalign89/184 P ~1

  17. Performance data 4 50 s 1

  18. Sequence and Structure Alignments Sequence alignment Structure alignment Based on residue identity, sometimes with a modified alphabet Based on geometrical equivalence of residue positions, residue type disregarded --AARNEDDDGKMPSTF-L E-AARNFG-DGK--STFIL • Used for: • evolution studies • protein function analysis • guessing on structure similarity • Used for: • protein function analysis • some aspects of evolution studies Algorithms: Dynamic programming + heuristics Applications: BLAST, FASTA, FLASH and others Algorithms: Dynamic programming, graph theory, MC, geometric hashing and others Applications: DALI, VAST, CE, MASS, SSM and others

  19. Sequence and Structure Identity 20% of identical residues are very often sufficient for chains to be structurally similar Good structure similarity E. Krissinel & K. Henrick (2004), Acta Cryst. D60, 2256-2268

  20. Naively, A C B 20% 20%  20% ? Sequence identity within structure families Given that A  B at 20%, B  C at 20%, is A  C at 20% or more? Ok, 20% sequence identity is not a necessary condition for structural similarity. How distant the sequences within a structure family may be?

  21. Sequence identity within structure families: case A Aligned residues are structurally conserved through the family. This is a typical assumption for multiple sequence alignment. HIS HIS CYS CYS • Implications: • Protein folds are controlled by certain residue types and/or subsequences. • Protein structure and therefore function are clearly sequence-related TRP TRP A B C

  22. Sequence identity within structure families: case B Aligned residues are not conserved through the family. HIS HIS • Implications: • Protein folds are not controlled by any particular residue types and/or subsequences. • Many different sequences may fold into similar structures • Protein structure and therefore function are not clearly sequence-related CYS CYS TRP TRP A B C

  23. Sequence identity within structure families: case B This case may be identified by multiple structure alignment only. Multiple sequence alignment will always find and superpose short fragments: HIS HIS CYS CYS -----AFRNEDDDGGKPSTFKL EAARNAF-------GKKSTFIL EAARNAFDGKMTBIGK------ TRP TRP A B C

  24. SCOP 11 classes 945 folds 1539 superfamilies 2845 families 70859 domains Multiple alignment of SCOP folds SCOP database • Structure-related hierarchy • Manually curated Multiple structure alignment of domains in SCOP folds • Sound structure resemblance within folds • Wide sequence variations • Sequence redundancy cut-off at 50%

  25. case A case B Sequence identity in SCOP folds multiple sequence conservation (case A) pairwise sequence conservation (case B) Average multiple sequence identity (A) 12% Average pairwise sequence identity (B) 19%

  26. Residue conservation Odds are calculated as a ratio of observed and expected probabilities to obtain identity residue substitutions: Henikoff, S. and Henikoff, J. G. (1992) Proc. Natl. Acad. Sci. 89, p. 10915.

  27. Residue conservation Reference data fromNaor D. et.al. (1996). J. Mol. Biol. 256, p. 924.

  28. Log odds matrix for SCOP folds Hydropathy index byKyte, J. and Doolittle, R. F. (1982). J. Mol. Biol. 157, p. 105.

  29. case A case B Sequence vs “hydropathy” identity in SCOP folds multiple sequence conservation (case A) pairwise sequence conservation (case B) hydropathy conservation Average pairwise sequence identity 19% Average multiple sequence identity 12% Average “hydropathy” identity 68%

  30. 10 hydrophilic residues Count matrix 10 hydrophobic residues Total counts (in upper triangle) Expected sequence identity What is 20% sequence identity? Consider an idealized model, where all residues are indiscriminately substituted by like-hydropathic residues only :

  31. Conclusion • it is quite possible that residue identity plays a much less significant role in protein structure than often believed • as a consequence, the role of residue identity in protein function may be often overestimated • using sequence identity for the assessment of structural or functional features may give more false negatives than expected • physical-chemical properties of residues should be given preference over residue identity in structure and function analysis • modern methods for structure alignment are efficient; there is little sense to use sequence alignment in structure-related studies Acknowledgement. This work has been supported by research grant No. 721/B19544 from the Biotechnology and Biological Sciences Research Council (BBSRC) UK.

More Related