Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
LSM2104/CZ2251 Essential Bioinformatics and Biocomputing  PowerPoint Presentation
Download Presentation
LSM2104/CZ2251 Essential Bioinformatics and Biocomputing 

LSM2104/CZ2251 Essential Bioinformatics and Biocomputing 

266 Views Download Presentation
Download Presentation

LSM2104/CZ2251 Essential Bioinformatics and Biocomputing 

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. LSM2104/CZ2251 Essential Bioinformatics and Biocomputing  Protein Structure and Visualization (2) Chen Yu Zong   csccyz@nus.edu.sg 6874-6877

  2. LSM2104/CZ2251 Essential Bioinformatics and Biocomputing  Lecture 10 Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein domains: SCOP & CATH & DALI

  3. 1. Protein Data Bank (PDB) • Protein Data Bank: maintained by the Research Collaboratory for Structural Bioinformatics (RCSB) • http://www.rcsb.org/pdb/ • 30060 Structures 15-Mar-2005 • 27570 Structures 05-Oct-2004 • 23997 Structures 20-Jan-2004 • Also contains structures of other bio-macromolecules: DNA, carbohydrates and protein-DNA complexes.

  4. 1. Protein Data Bank (PDB)

  5. 1. Protein Data Bank (PDB)

  6. PDB Content Growth

  7. PDB Presentation of Selected Molecules

  8. Only deposited data is actually available Many structures not deposited in PDB, why? Structures available for soluble proteins A few dozen entries for membrane protein domains, why? X-ray data only for those proteins that crystallize well or diffract properly. Why? NMR structures are usually for small proteins How to survey the size of NMR-determined proteins? Estimated that structural data available for only 10-15% of all known proteins. Deficiencies in our structural knowledge

  9. Alternative Source of Structure: NCBI

  10. Protein Structure in PDB • Text files • Each entry is specified by a unique 4-letter code (PDB code): say 1HUY for a variant of GFP; 1BGK for a 37-residue toxin protein isolated from sea anemone • 1HUY and 1BGK • Header information • Atomic coordinates in Å (1 Ångstrom = 1.0e-10 m)

  11. Header Details • Identifies the molecule, modifications, date of release • Host organism, keywords, method of study • Authors, reference, resolution for X-ray structure • Smaller the number, better the structure. • Sequence, reference

  12. The Atomic Coordinates • XYZ Coordinates for each atom (starting with ATOM, only heavy atom for X-ray structure) from the first residue to the last • XYZ coordinates for any ligands (starting with HETATM) complexed to the bio-macromolecule • O atoms of water molecules (starting with HETATM, normally at the last part of the xyz coordinate section) • Usually, for X-ray structure, resolution is not high enough to locate H atoms: hence only heavy atoms are shown in the PDB file. • For NMR structure, all atoms (including hydrogen atoms) are specified in the PDB file.

  13. X-ray structure 1HUY

  14. NMR structure 1BGK

  15. 2. Free Software for Protein Structure Visualization • RASMOL: available for all platforms http://www.openrasmol.org • Swiss PDB Viewer: from Swiss-Prot http://www.expasy.ch/spdbv/ • Chemscape Chime Plug-in: for PC and Mac http://www.mdl.com/downloads/downloadable/index.jsp • YASARA: http://www.yasara.org/ • MOLMOL: MOLecule analysis and MOLecule display http://129.132.45.141/wuthrich/software/molmol/index.html

  16. Ribbon representation by RasMol 1HUY An Improved Yellow Variant Of Green Fluorescent Protein From Tsien’s group J.Biol.Chem. 276 29188 (2001)

  17. Ribbon representation by YASARA

  18. Ribbon representation by YASARA

  19. Ribbon representation by MOLMOL

  20. An ensemble of 15 structures (NMR, toxin Bgk); Proton atoms also included 15 backbone structures of the sea anemone toxin Bgk

  21. 15 all-atom structures of the sea anemone toxin Bgk Line representation

  22. Ribbon representation

  23. Space-filling representation

  24. SCOP:Structural Classification of Proteins University of Cambridge, UK http://scop.mrc-lmb.cam.ac.uk/scop/ Hyperlink in Singapore: http://scop.bic.nus.edu.sg/ CATH:Class—Architecture—Topology --Homologous Superfamily Sequence family University College London, UK http://www.biochem.ucl.ac.uk/bsm/cath/ 3. Hierarchical classification of protein domains: SCOP & CATH

  25. Proteins adopt a limited number of topologies More than 50,000 sequences fold into ~1000 unique folds. Homologous sequences have similar structures Usually, when sequenceidentity>30%, proteins adopt the same fold. Even in the absence of sequence homology, some folds are preferred by vastly different sequences. The “active site” is highly conserved A subset of functionally critical residues are found to be conserved even the folds are varied. Basis for protein classification

  26. How many unique folds do organisms use to express functions? Sequence space > 50,000 Conformational space Many sequences to form one unique fold ~1,000 ???????

  27. Growth of Protein Databases

  28. Structural Classification of Proteins SCOP • University of Cambridge, UK: http://scop.mrc-lmb.cam.ac.uk/scop/ • mirrored at Singapore: http://scop.bic.nus.edu.sg/ • contains PDB entries grouped hierachically by: • Structural class, • Fold, • Superfamily, • Family, • Individual member (domain-based)

  29. Family Structural Classification of Proteins SCOP • Proteins are clustered together into families on the basis of one of two criteria that imply their having a common evolutionary origin: • All proteins that have residue identities of 30% and greater; • Proteins with lower sequence identities but whose functions and structures are very similar • Example, globins with sequence identities of 15%.

  30. Superfamily Structural Classification of Proteins SCOP • Families, whose proteins have low sequence identities but whose structures and, in many cases, functional features suggest that a common evolutionary origin is probable, are placed together in superfamilies • Example, actin, the ATPase domain of the heat-shock protein and hexokinase

  31. Structural Classification of Proteins SCOP • Fold • Superfamilies and families are defined as having a common fold if their proteins have same major secondary structures in same arrangement with the same topological connections.

  32. Structural Classification of Proteins SCOP • Class • For convenience of users, the different folds have been grouped into classes. Most of the folds are assigned to one of a few structural classes on the basis of the secondary structures of which they composed

  33. SCOP Class: All-a topologies cytochrome b-562 ferritin

  34. SCOP Class: All-a topologies

  35. SCOP Class: All-a topologies

  36. SCOP Class: All-b topologies b-barrels b sandwiches

  37. SCOP Class: All-b topologies

  38. SCOP Class: a/b Topologies a/b horseshoe

  39. SCOP Class: a/b Topologies a/b barrels

  40. SCOP Class: a/b Topologies

  41. SCOP Class: Alpha+Beta Topologies

  42. SCOP Class: Alpha+Beta Topologies

  43. Ubiquitin 1ubi

  44. Ubiquitin 1ubi

  45. Ubiquitin 1ubi

  46. Ubiquitin 1ubi