1 / 53

What is in a PDB file?

What is in a PDB file?. Shuchismita Dutta. Spring 2010. Overview. Exploring the PDB format file File formats and dictionaries Finding PDB format files and other files Validation. Exploring PDB file. Meta data. Coordinates. Title section. OBSLTE 18-JUL-84 1HHB 2HHB 3HHB 4HHB.

paytah
Download Presentation

What is in a PDB file?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is in a PDB file? Shuchismita Dutta Spring 2010

  2. Overview Exploring the PDB format file File formats and dictionaries Finding PDB format files and other files Validation

  3. Exploring PDB file Meta data Coordinates

  4. Title section OBSLTE 18-JUL-84 1HHB 2HHB 3HHB 4HHB SPLIT 1JGP 1JGQ 1JGO CAVEAT 1B86 THERE ARE CHIRALITY ERRORS IN C-ALPHA CENTERS REVDAT 4 24-FEB-09 4HHB 1 VERSN REVDAT 3 01-APR-03 4HHB 1 JRNL REVDAT 2 15-OCT-89 4HHB 3 MTRIX REVDAT 1 17-JUL-84 4HHB 0 SPRSDE 17-JUL-84 4HHB 1HHB

  5. Remarks: the numbers mean something REMARK 0 REMARK 0 THIS ENTRY (2Q41) REFLECTS AN ALTERNATIVE MODELING OF THE REMARK 0 ORIGINAL STRUCTURAL DATA (R1XJ5SF) DETERMINED BY AUTHORS REMARK 0 OF THE PDB ENTRY 1XJ5: G.E.WESENBERG,D.W.SMITH, REMARK 0 G.N.PHILLIPS JR.,E.BITTO,C.A.BINGMAN,S.T.M.ALLARD, REMARK 0 CENTER FOR EUKARYOTIC STRUCTURAL GENOMICS (CESG). Data collection details: X-ray source, detector, data collection details (200) Fiber diffraction (205) NMR (210, 215, 217) Neutron diffractions (230) Electron crystallography (240) Electron Microscopy (245) Crystallographic details: Vm, Matthew’s coefficient Crystallographic symmetry

  6. Remark 3 Data from each refinement software has its own template and details

  7. Remarks: the numbers mean something Biological assembly information

  8. Example of a virus (1AYN)

  9. Remarks Compound details Missing residues, atoms Geometry: close contacts, bond length, angle and torsion deviations, sterochemistry Ligand details Related entries Sequence details

  10. Chemistry sections : Primary Structure & Ligand DBREF 1BH0 A 1 29 UNP P01275 GLUC_HUMAN 53 81 SEQADV 1BH0 LYS A 17 UNP P01275 ARG 69 ENGINEERED SEQADV 1BH0 LYS A 18 UNP P01275 ARG 70 ENGINEERED SEQADV 1BH0 GLU A 21 UNP P01275 ASP 73 ENGINEERED SEQRES 1 A 29 HIS SER GLN GLY THR PHE THR SER ASP TYR SER LYS TYR SEQRES 2 A 29 LEU ASP SER LYS LYS ALA GLN GLU PHE VAL GLN TRP LEU SEQRES 3 A 29 MET ASN THR MODRES 2F4K NLE A 65 LEU NORLEUCINE MODRES 2F4K NLE A 70 LEU NORLEUCINE HET PO4 D 147 1 HET PO4 B 147 1 HET HEM A 142 43 HET HEM B 148 43 HET HEM C 142 43 HET HEM D 148 43 HETNAM PO4 PHOSPHATE ION HETNAM HEM PROTOPORPHYRIN IX CONTAINING FE HETSYN HEM HEME FORMUL 5 PO4 2(O4 P 3-) FORMUL 7 HEM 4(C34 H32 FE N4 O4) FORMUL 11 HOH *221(H2 O)

  11. Secondary Structure & Connectivity HELIX 1 AA SER A 3 GLY A 18 1 16 HELIX 2 AB HIS A 20 SER A 35 1 16 HELIX 3 AC PHE A 36 TYR A 42 1 7 SHEET 1 A 4 ILE A 18 LEU A 23 0 SHEET 2 A 4 LEU A 111 VAL A 118 -1 O GLY A 115 N TRP A 19 SSBOND 1 CYS A 6 CYS A 127 1555 1555 2.02 SSBOND 2 CYS A 30 CYS A 115 1555 1555 2.02 SSBOND 3 CYS A 64 CYS A 80 1555 1555 2.03 SSBOND 4 CYS A 76 CYS A 94 1555 1555 2.01 LINK NE2 HIS A 87 FE HEM A 143 1555 1555 1.94 LINK NE2 HIS B 92 FE HEM B 147 1555 1555 2.07 LINK FE HEM B 147 O1 OXY B 150 1555 1555 1.87 LINK FE HEM A 143 O1 OXY A 150 1555 1555 1.66 CISPEP 1 PRO A 98 PRO A 99 0 0.53 CISPEP 2 GLY A 109 PRO A 110 0 -0.01

  12. Miscellaneous SITE 1 ACT 3 HIS H 57 ASP H 102 SER H 195 SITE 1 AC1 12 HIS H 57 ASN H 98 LEU H 99 ILE H 174 SITE 2 AC1 12 ASP H 189 ALA H 190 SER H 195 TRP H 215 SITE 3 AC1 12 GLY H 216 GLY H 219 HOH H 264 HOH H 270 REMARK 800 SITE REMARK 800 SITE_IDENTIFIER: ACT REMARK 800 EVIDENCE_CODE: AUTHOR REMARK 800 SITE_DESCRIPTION: CATALYTIC SITE REMARK 800 SITE_IDENTIFIER: AC1 REMARK 800 EVIDENCE_CODE: SOFTWARE REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE MID H 1

  13. Crystallographic info, Coordinate Transformations & coordinates CRYST1 88.814 95.207 89.164 90.00 104.96 90.00 P 1 21 1 8 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 0.011259 0.000000 0.003009 0.00000 SCALE2 0.000000 0.010503 0.000000 0.00000 SCALE3 0.000000 0.000000 0.011609 0.00000 MODEL 1 ATOM 1 N SER A 41 -9.122 -10.304 89.511 0.12 51.94 N ATOM 2 CA SER A 41 -8.282 -11.187 88.650 0.12 52.75 C ATOM 3 C SER A 41 -7.051 -11.693 89.414 0.12 52.51 C ATOM 4 O SER A 41 -6.646 -11.108 90.421 0.12 53.15 O ATOM 5 CB SER A 41 -7.845 -10.416 87.393 0.12 51.93 C ATOM 6 OG SER A 41 -7.250 -11.264 86.423 0.12 52.59 O ATOM 7 N THR A 42 -6.473 -12.792 88.935 0.12 51.75 N ATOM 8 CA THR A 42 -5.290 -13.380 89.552 0.12 50.38 C ... ENDMDL

  14. Coordinate section: A Closer look Residue name y coordinate x coordinate z coordinate Atom name Occupancy Atom type Residue # Chain ID B-factor S.# ATOM 49 N GLY A 8 2.326 4.110 1.416 1.00 42.03 N ATOM 50 CA GLY A 8 3.121 3.079 2.065 1.00 42.27 C ATOM 51 C GLY A 8 3.533 3.408 3.476 1.00 42.32 C ATOM 52 O GLY A 8 4.302 2.642 4.092 1.00 44.09 O ATOM 53 N GLY A 9 3.080 4.526 4.038 1.00 40.18 N ATOM 54 CA GLY A 9 3.330 4.880 5.396 1.00 40.11 C ATOM 55 C GLY A 9 4.552 5.685 5.709 1.00 39.75 C ATOM 56 O GLY A 9 4.720 6.098 6.885 1.00 40.96 O ATOM 57 N ASER A 10 5.404 6.014 4.753 0.33 39.21 N ATOM 58 CA ASER A 10 6.598 6.814 5.042 0.33 38.11 C ATOM 59 C ASER A 10 6.236 8.234 5.479 0.33 36.87 C ATOM 60 O ASER A 10 5.150 8.733 5.233 0.33 32.77 O ATOM 61 CB ASER A 10 7.516 6.864 3.822 0.33 39.46 C ATOM 62 OG ASER A 10 8.894 6.884 4.237 0.33 40.79 O ATOM 63 N BGLY A 10 5.404 6.014 4.753 0.67 39.21 N ATOM 64 CA BGLY A 10 6.598 6.814 5.042 0.67 38.11 C ATOM 65 C BGLY A 10 6.236 8.234 5.479 0.67 36.87 C ATOM 66 O BGLY A 10 5.150 8.733 5.233 0.67 32.77 O Alternate conformer ID Microheterogeneity (1ENM)

  15. Residue name y coordinate x coordinate z coordinate Atom name Occupancy Atom type Residue # Chain ID B-factor S.# ATOM 1 N GLU L 1C 63.677 26.331 17.947 1.00 31.77 N ATOM 2 CA GLU L 1C 64.338 26.818 16.736 1.00 35.78 C ATOM 3 C GLU L 1C 63.351 27.360 15.717 1.00 41.73 C ATOM 4 O GLU L 1C 63.320 28.565 15.489 1.00 49.37 O ATOM 5 CB GLU L 1C 65.320 25.825 16.101 1.00 38.64 C ATOM 6 N ALA L 1B 62.537 26.499 15.096 1.00 36.03 N ATOM 7 CA ALA L 1B 61.571 26.988 14.116 1.00 33.01 C ATOM 8 C ALA L 1B 60.631 28.018 14.729 1.00 32.42 C ATOM 9 O ALA L 1B 60.238 27.865 15.872 1.00 31.68 O ATOM 10 CB ALA L 1B 60.810 25.845 13.511 1.00 33.36 C ATOM 11 N ASP L 1A 60.262 29.089 14.012 1.00 33.13 N ATOM 12 CA ASP L 1A 59.378 30.016 14.691 1.00 35.05 C ATOM 13 C ASP L 1A 57.965 29.526 14.760 1.00 31.74 C ATOM 14 O ASP L 1A 57.476 28.873 13.851 1.00 36.72 O ATOM 15 CB ASP L 1A 59.593 31.557 14.587 1.00 41.32 C ATOM 16 CG ASP L 1A 58.724 32.268 13.564 1.00 46.17 C ATOM 17 OD1 ASP L 1A 57.452 32.455 13.924 1.00 47.60 O ATOM 18 OD2 ASP L 1A 59.188 32.658 12.472 1.00 48.99 O ATOM 19 N CYS L 1 57.321 29.802 15.860 1.00 22.52 N ATOM 20 CA CYS L 1 56.005 29.353 16.036 1.00 15.35 C ATOM 21 C CYS L 1 55.351 30.160 17.077 1.00 15.83 C ATOM 22 O CYS L 1 56.002 30.636 17.968 1.00 18.73 O Insertion codes Residue numbering (1DWD)

  16. Connectivity & Book keeping CONECT 73 80 CONECT 80 73 81 CONECT 81 80 82 84 CONECT 82 81 83 88 Nonstd residues coordinates SEQRES remarks Sheet Helix MASTER 2487 0 28 47 52 0 0 673322 31 280 104 END

  17. The PDB format guide Located at http://www.wwpdb.org/documentation/format32/v3.2.html Defines all the records that appear in the PDB file Includes templates for all records and remarks

  18. www.wwpdb.org

  19. Keeping track of all the information PDB format file is a report from a database The database is built on the PDB exchange and chemical component dictionaries The PDB exchange dictionary captures every piece of data from the PDB format file to build the mmCIF format file Validation uses dictionaries to Check inter-relationships between different data components Match information to chemical component dictionary

  20. -snip- mmCIF format file PDB format file

  21. PDB Format vs mmCIF Format 80 characters wide Includes header and coordinates (x, y, z, occupancy and B-factors) for all atoms. Includes name, source and sequence of all polymers Can include a maximum of 62 chains and 99999 atoms. Free format Includes header and coordinates (x, y, z, occupancy and B-factors) for all atoms. Includes name, source and sequence of all polymers No restriction to number of chains or atoms in file.

  22. Keeping track of all the information PDB format file is a report from a database The database is built on the PDB exchange and chemical component dictionaries The PDB exchange dictionary captures every piece of data from the PDB format file to build the mmCIF format file Validation uses dictionaries to Check inter-relationships between different data components Match information to chemical component dictionary

  23. Dictionaries PDB Exchange (pdbx) dictionary (http://mmcif.pdb.org/) Includes the syntax, definitions, relations, boundaries Includes examples for the contents of the mmCIF format file. Chemical Component Dictionary Describes all residues in the PDB files (standard, non-standard amino acids, nucleotides and other ligands - ions, drugs, cofactors, inhibitors) 1-3 alphanumeric character identifier Includes model & idealized coordinates for components, connectivities, name, formula, smiles strings Maintained by the wwPDB. Used for data processing and validation of structures

  24. Ligand cif file

  25. Ligand Expo - Search Options

  26. Ligand Expo – Substructure Search Also use for component building

  27. Ligand Expo - Browse Options

  28. -snip- PDB Exchange Dictionary includes syntax & definitions for mmCIF format files PDB format file mmCIF format file Instance of valine matched to VAL in Chemical Component Dictionary

  29. Downloading files Coordinate PDB 80 character wide Created for X-ray structures Updated for NMR, EM and other methods mmCIF More flexible format Based on mmCIF (PDBX) dictionary PDBML XML translation of mmCIF format files Biological Unit Experimental data SF files Distributed in mmCIF format Constraints file Validated by BMRB

  30. Archive download

  31. The ftp archive

  32. RCSB PDB website

  33. The Structure Summary page

  34. Asymmetric and Biological Unit

  35. Structure Analysis (RCSB tables)

  36. Validation Quality assessment Is the structure well determined overall? Is the structure suitable for your analysis and/or modeling requirements? Are local regions that you are interested in well determined?

  37. When to Validate? Refinement Step 0: Validation Download Data Validation Use of PDB data Step 2: Validation Report Step 1: PDB ID Archival Data Primary Annotation Depositor Deposition Data Distribution Site PDB Entry Validation Core Database Step 3: Corrections Step 4: Depositor Approval Step 5: Functional Annotation

  38. What is validated? Chemistry Of polymer (match to DB and internal consistency) Of ligands, ions, inhibitors (match to dictionary) Geometry Close contacts Bond length, angle, torsion etc. deviations Ramachandran plot Experimental data SF check R factors

  39. How to validate? Molprobity EDS server Procheck Whatcheck/Whatif Validation server at RCSB PDB

  40. Electron Density Server report

  41. Real-space R-value

  42. Electron Density Server report

  43. Real-space R-value

  44. Validation at RCSB PDB

More Related