1 / 29

Protein Classification II

Protein Classification II. CISC889: Bioinformatics Gang Situ 04/11/2002. Parts of this lecture borrowed from lecture given by Dr. Altman. Outline. Terminology Classes of protein structures Why do we need to align structures Viewing protein structures and

lamont
Download Presentation

Protein Classification II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman

  2. Outline • Terminology • Classes of protein structures • Why do we need to align structures • Viewing protein structures and • How to recognize structural similarities • Algorithms • Summary

  3. Terminology Tertiary structure, three-dimensional Class • similar 2° structure • all a, all b, a + b, a/b Fold - major structural similarity - similar arrangement of 2° Superfamily (topology) - probable common ancestry Family - clear evolutionary relationship - sequence similarity > 25% Individual Protein

  4. Class of Protein Structure • Classa • Classb • Class a / b • Class a + b • Multidomain proteins • Membrane and cell surface proteins • And more …

  5. Structure of a class proteins

  6. Structure of b class proteins

  7. Structure of a/b class proteins

  8. Structure of a + b class proteins

  9. Structure of membrane proteins

  10. What is structure alignment In performing, the three-dimensional structure of one protein domain is superimposed upon the a second protein domain, to achieve minimal RMS To discover structural similarity

  11. Why Align Structures • For homologous proteins (similar ancestry), this provides the “gold standard” for sequence alignment--elucidates the common ancestry of the proteins. • For nonhomologous proteins, allows us to identify common substructures of interest. • Allows us to classify proteins into clusters, based on structural similarity.

  12. Evaluating Structural Alignments to be considered: 1. Number of amino acid correspondences created. 2. RMSD of corresponding amino acids 3. Percent identity in aligned residues 4. Number of gaps introduced 5. Size of the two proteins 6. Conservation of known active site environments … There are no universally agreed upon criteria. As usual, it depends on what you are using the alignment to do.

  13. Methods Align w/ known structure?3 Database Similarity search2 No Protein family analysis4 Protein sequence1 Yes Relationship To know structure?5 Yes Predicted tertiary structure Tertiary comparative modeling8 Yes No Predicted Structure?7 Tertiary Structure analysis9 No Structural analysis6

  14. Viewing Protein Structures • Chimehttp://www.umass.edu/microbio/chime/ A Web browser plug-in to display and manipulate structures inside a Web page. • Cn3dahttp://www.ncbi.nlm.nih.gov/Structure/ • Provides viewing of three-dimensional structures from Entrez and MMDBa. Cn3D runs on Windows, MacOS, and Unix; simultaneously displays structural and sequence alignments; can show multiple superimposed images from NMR studies. • Magehttp://kinemage.biochem.duke.edu/ (see Richardson and Richardson 1994) • Standard molecular viewing features with animation and kaleidoscope effects. • Rasmolbhttp://www.umass.edu/microbio/rasmol/ • Most commonly used viewer for Windows, MacOS, UNIX, and VMS operating systems. Performs many functions. • Swiss 3D viewer, Spdbvhttp://www.expasy.ch/spdbv/mainpage.html (Guex and Peitsch 1997) • Protein models can be built by structural alignments; calculates atomic angles and distances, threading, energy minimation, and interacts with the Swiss Model server.

  15. Protein Structure Classification Databases • SCOP -- structural classification of proteins • FSSP -- fold classification and multiple structure alignments • CATH -- structural classification of proteins • MMDB – by VAST program • SARF

  16. Alignment of Protein Structures • Difference: Sequence vs. Structural similarity. Indicator to evolutionary relationship? • More difficult to align structures • Similar structure may form by many different foldings of the amino acid Ca • Although the local environments of many molecules in two proteins may be similar, there may also be some local differences.

  17. How to recognize structural similarities • By eye 2. Algorithmically • point-based methods use properties of points (distances) to establish correspondences • secondary structure-based methods use vectors representing secondary structures to establish correspondences.

  18. Align Structures by Secondary Structures

  19. Three prototypical methods 1. STRUCTAL, uses dynamic programming iteratively to refine an arbitrary starting alignment. 2. DALI, Uses distance matrix to find similar patterns of distances, indicating correspondences. 3. LOCK, uses vectors associated with secondary structures to do quick screen for similar structures.

  20. STRUCTAL Uses dynamic programming iteratively to refine an arbitrary starting alignment. STEPS: 1. Start with any set of correspondences between two structures (sequence alignment, secondary structure alignment, by eye, random). 2. Compute a score matrix by computing a score between all pairs of points based on their distance. 3. Trace back through the score matrix to find a new set of correspondences that maximizes the score (standard DP) 4. Iterate 2 and 3 until score doesn’t change. Note: heuristic, no guarantees of success, depends on quality of starting structure.

  21. Scoring in STRUCTAL Need to find a score that is maximal when alignment is good (good distances are small). Also may want to include other computable attributes of the point. Where M is maximum score desired, d is the measured value (of distance or some other attribute), and do is value at which score is 0. All values between do and d get some “credit” but values less than do are penalized.

  22. Distance Matrix • Similar to a dot matrix to identify the atoms that lie most closely together • If two proteins have a similar structure, the graphs of these structures will be superimposable.

  23. DALI Uses distance matrix to find similar patterns of distances, indicating correspondences. STEPS: 1. Systematically look through 2 distance matrices to find pairs of segments with similar pattern of distances. Provides pairs of similar segments. 2. Assemble pairs into larger sets, to maximize the number of atoms and minimize the RMS distance between them. The assembly step is done in a random fashion, since the search space is too large.

  24. DALI

  25. DALI

  26. DALI

  27. Fast Structural Search based on Secondary Structure Analysis Steps for LOCK 1. Define local secondary structures. 2. Find an initial superposition by using DP (and score functions shown) to align secondary structure vectors. 3. Use greedy algorithm to find nearest neighbors and minimize RMSD. 4. Prune the atoms to get core with minimal RMSD

  28. Summary • Structural alignment is a key activity, combinatorially expensive, used for : • Gold standard for alignments • Elucidating evolutionary relationships • Creating classifications of protein structure • Multiple methods exist, often based on a basic DP approach including • Analysis of distances • Analysis of vectors • Combinations of both

  29. Summary • STRUCTAL – dynamic programming using a distance metric • DALI – analysis of distance maps • LOCK – analysis of secondary structure vectors, followed by refinement with distances

More Related