1 / 43

BMI 731

BMI 731. Protein Structures and Related Database Searches. Protein. DNA (Genotype). Biology … Protein…. A single amino acid substitution in a protein causes sickle-cell disease…. What the.....!?. Why do we care about structure?.

hgiordano
Download Presentation

BMI 731

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BMI 731 Protein Structures and Related Database Searches

  2. Protein DNA (Genotype) Biology … Protein…

  3. A single amino acid substitution in a protein causes sickle-cell disease…

  4. What the.....!?

  5. Why do we care about structure? • In the factory of living cells, proteins are the workers, performing a variety of biological tasks. • Each protein has a particular 3-D structure that determines its function. • Protein structure is more conserved than protein sequence, and more closely related to function. • Sequence -> Structure -> Function

  6. Structural Information • Protein Data Bank: maintained by the Research Collaboratory of Structural Bioinformatics (RCSB) • http://www.rcsb.org/pdb/ • > 15,000 structures of proteins • Also contains of structures of Protein/Nucleic Acid Complexes, Nucleic Acids, Carbohydrates • Most structures are determined by X-ray crystallography. Other methods are NMR and electron microscopy (EM). Some structures are also theoretically predicted.

  7. PDB Content Growth

  8. Protein? • Protein are linear heteropolymers: one or more polypeptide chains • Building blocks: 20(?) amino acid residues. • Range from a few 10s-1000s • Three-dimensional shapes (“fold”) adopted vary enormously.

  9. Structure…

  10. Structure cont…

  11. Basic measurements on structures… • Bond lengths • Bond angles • Dihedral (torsion) angles

  12. Bond Length • The distance between bonded atoms is constant • Depends on the “type” of the bond • Varies from 1.0 Å(C-H) to 1.5 Å(C-C) • BOND LENGTH IS A FUNCTION OF THE POSITION OF TWO ATOMS.

  13. Bond Angle… • All bond angles are determined by chemical makeup of the atoms involved, and are constant. • Depends on the type of atom, and number of electrons available for bonding. • Ranges from 100° to 180° • BOND ANGLES IS A FUNCTION OF THE POSITION OF THREE ATOMS.

  14. Dihedral Angles • These are usually variable • Range from 0-360° in molecules • Most famous are , ,  and  • DIHEDRAL ANGLES ARE A FUNCTION OF THE POSITION OF FOUR ATOMS. http://www.colby.edu/chemistry/OChem/DEMOS/dihedral.html

  15. Dihedral Angles A torsion angles is defined by 4 atoms, A, B, C and D. When atoms A, B, C and D are mainchain atoms (ie. the carboxylic carbon, C1; the alpha carbon, C2 or C-alpha; and the amide group nitrogen, N), There are THREE repeating torsion angles along the backbone chain called phi, psi and omega. http://bmbiris.bmb.uga.edu/wampler/tutorial/prot2.html

  16. Ramachandran / phi-psi plot http://www.biochem.ucl.ac.uk/~roman/procheck/manual/examples/plot_01.html

  17. Levels of Structure… 1 - Primary structure 2 - Secondary structure 3 - Tertiary structure 4 - Quaternary structure

  18. Primary structure… • This is simply the amino acid sequences of polypeptide chains

  19. Secondary structure • Local organization of protein backbone: -helix, -strand (which assemble into -sheet), turn and interconnecting loop.

  20. The -helix • One of the most closely packed arrangement of residues. • Turn: 3.6 residues • Pitch: 5.4 Å/turn

  21. The -sheet • Backbone almost fully extended, loosely packed arrangement of residues.

  22. Ramachandran/phi-psi plot

  23. Tertiary structure… • Packing the secondary structure elements into a compact spatial unit • “Fold” or domain– this is the level to which structure prediction is currently possible.

  24. Quaternary structure… • Assembly of homo or heteromeric protein chains. • Usually the functional unit of a protein, especially for enzymes

  25. Classification… • Class • Fold/Architecture • Superfamily

  26. Databases of structural classification • SCOP • Murzin AG, Brenner SE, Hubbard T, Chothia C • Structural classification of protein structures • Manual assembly by inspection • All nodes are annotated (e.g.. All-, /) • Structural similarity search using 3dSearch(Singh and Brutlag) • CATH • Dr. C.A. Orengo, Dr. A.D. Michie, etc • Class-Architecture-Topology-Homologous superfamily • Manual classification at Architecture level • Automated topology classification using the SSAP algorithms • No structural similarity search

  27. Databases of structural classification • FSSP • L.L. Holm and C. Sander • Fully automated using the DALI algorithms (Holm and Sander) • No internal node annotations • Structural similarity search using DALI • Pclass • A. Singh, X. Liu, J. Chang, D. Brutlag • Fully automated using the LOCK and 3dSearch algorithms • All internal nodes automatically annotated with common terms • JAVA based classification browser • Structural similarity search using 3dSearch

  28. Why Structure Alignment? • For homologous proteins (similar ancestry), this provides the “gold standard” for sequence alignment—elucidates the common ancestry of the proteins. • For nonhomologous proteins, allows us to identify common substructures of interest. • Allows us to classify proteins into clusters, based on structural similarity.

  29. How do we recognize structural similarities? • By eye (Alexei Murzin) SCOP--Gold standard for structure classification! • Algorithmically Growth of PDB demands automated techniques for classification and fold detection

  30. Algorithms for Structure Alignment • Distance based methods • DALI (Holm and Sander): Aligning scalar distance plots • STRUCTAL (Gerstein and Levitt): Dynamic programming using pairwise inter-molecular distances • SSAP (Orengo and Taylor): Dynamic programming using intra-molecular vector distance • Vector based methods • VAST (Bryant): Graph theory based secondary structure alignment • 3dSearch (Singh and Brutlag): Fast secondary structure index lookup • Both vector and distance based • LOCK (Singh and Brutlag): Hierarchically uses both secondary structures vectors and atomic distances

  31. DALI • Based on aligning 2-D intra-molecular distance matrices • Computes the best subset of corresponding residues from the two proteins such that similarity between the 2-D distance matrices is maximized. • Searches through all possible alignments of residues using Monte-Carlo algorithms

  32. VAST-Vector Alignment Search Tool • Aligns only secondary structure elements (SSE) • Represents each SSE as a vector • Finds all possible pairs of vectors from the two structures that are similar • Uses a graph theory algorithms to find maximal subset of similar vectors • Overall alignment scores is based on the number of similar pairs of vectors between the two structures.

  33. LOCK • Define local secondary structures • Find an initial superposition by using DP to align secondary structure vectors. • Use greedy algorithms to find nearest neighbors and minimize RMSD between the C- atoms from query and target. • Find the core of aligned C- atoms and minimize RMSD between them.

  34. GenBank Where is the data? DB are equivalent

  35. RefSeq NCBI Reference Sequences GenPeptDatabase http://inn.weizmann.ac.il/databanks/genpept.html http://www.expasy.org/sprot/ STATS: http://www.expasy.org/sprot/relnotes/relstat.html http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html http://www.rcsb.org/pdb/ PIR International Protein Sequence Database http://pir.georgetown.edu/pirwww/search/textpsd.shtml http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein

  36. MMDB by NCBI…

  37. A Flow chart for structure prediction Protein sequence Database similarity search Protein family, domain, cluster analysis Does sequence align with protein of known 3D structure? no Predicted three dimensional structure 3D comparative modeling Relation-ship to known structure? yes no 3D analysis in laboratory Is there a predicted structure? Structural analysis no

  38. Images.. • 3-dimensional model showing the electron density in a molecule of buckminsterfullerene, an allotrope of carbon (C60).

  39. Images… Computer generated image, showing 3-D structure of uteroglobin, a protein secreted in the uterus of mammals.

  40. Images… (NMR… EPR…) A computer image of the charge density over the molecule chymosin, an important enzyme in cheese making. Overall negative charge is depicted as red, overall positive charge is shown in blue.

  41. X-ray crystallography.

  42. Thanks Thanks to Selnur Erdal for preparing initial versions of these slides.

More Related