1 / 114

3D STRUCTURE PREDICTION

3D STRUCTURE PREDICTION. INTRODUCTION. A Long, Long Time Ago…. Amino acids started to make complex structures and life has appeared. INTRODUCTION. 3.5 billion years later …(Around today…). We know more proteins sequences than we knew at the beginning of life (and even in the 1970’s!).

lamont
Download Presentation

3D STRUCTURE PREDICTION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3D STRUCTURE PREDICTION Doctorado UAM Ana Rojas

  2. INTRODUCTION A Long, Long Time Ago… Amino acids started to make complex structures and life has appeared... Doctorado UAM Ana Rojas

  3. INTRODUCTION 3.5 billion years later…(Around today…) • We know more proteins sequences than we knew at the beginning of life (and even in the 1970’s!). • In some cases we know their function. • And rarely do we know their structure Doctorado UAM Ana Rojas

  4. AAVLYFGREDHTLLVY AAVLYFGREDHTLLVY AAVLYFGREDHTLLVY 2nd pred correlated mutations 2nd pred. Current methods to predict protein structure Structural level Secondary ------ Tertiary Quaternary 1D 2D 3D 4D Schema Additional info -molecular dynamics • Energy minimization -docking Ab Initio -filtered docking No Ab-Initio -homology modeling -threading Doctorado UAM Ana Rojas

  5. Introduction Why to Predict Secondary Structure? Secondary structure prediction is the first step in the protein folding prediction. Predicted secondary structure can be used to help identifying proteinfunction - by searching for similar secondary structural motifs. Good secondary structure prediction is also useful in fold detection. The best fold recognition methods use a combination of sequence profiles and prediction secondary structure. Doctorado UAM Ana Rojas

  6. Introduction WHAT IS PREDICTING SECONDARY STRUCTURE ??? To predict the alpha-beta-loop arrangement of a protein from aa sequence Doctorado UAM Ana Rojas

  7. Why Such a shift? Sequencing DNA is easy= 1-2 days Experimental determination of a protein is difficult= 1-3 years Small targets Doctorado UAM Ana Rojas

  8. PDB Doctorado UAM Ana Rojas

  9. PDB file 1CRN.txt Doctorado UAM Ana Rojas

  10. PDB RECORD DISSECTION (OVERVIEW OF DESCRIPTORS) Protein Data Bank: PDB format For a complete descritpion see: ftp:/ftp.rcsb.org/pub/pdb/doc/format_descritpions/ Contests_Guide_21.txt Doctorado UAM Ana Rojas

  11. PDB RECORD (1) Filename=accession number=PDB code 1)Filename is 4 positions (often 1 digit & 3 letters, i.e.: 1CRN) 2)Be aware: 0HKY means entry HKY does not contain coordinates Header: Describes molecule & gives deposition date HEADER PLANT SEED PROTEIN 30-APR-81 1CRN 1CRND 1 COMPND CRAMBIN 1CRN 4 SOURCE ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED 1CRN 5AUTHOR W.A.HENDRICKSON,M.M.TEETER 1CRN 6 REVDAT 5 16-APR-87 1CRND 1 HEADER 1CRND 2REVDAT 4 04-MAR-85 1CRNC 1 REMARK 1CRNC 1REVDAT 3 30-SEP-83 1CRNB 1 REVDAT 1CRNB 1REVDAT 2 03-DEC-81 1CRNA 1 SHEET 1CRNB 2REVDAT 1 28-JUL-81 1CRN 0 1CRNB 3 CMPND: Name of the molecule Source: organism Revision Date Doctorado UAM Ana Rojas

  12. PDB RECORD (2) HELIX/SHEET/TURN: Secondary structure elements as provided by crystallographer (subjective) HELIX 1 H1 ILE 7 PRO 19 1 3/10 CONFORMATION RES 17,19 1CRN 55HELIX 2 H2 GLU 23 THR 30 1 DISTORTED 3/10 AT RES 30 1CRN 56SHEET 1 S1 2 THR 1 CYS 4 0 1CRNA 4SHEET 2 S1 2 CYS 32 ILE 35 -1 1CRN 58TURN 1 T1 PRO 41 TYR 44 1CRN 59 Disulfide- bridges SSBOND 1 CYS 3 CYS 40 1CRN 60SSBOND 2 CYS 4 CYS 32 1CRN 61SSBOND 3 CYS 16 CYS 26 1CRN 62 CRYST1 40.960 18.650 22.520 90.00 90.77 90.00 P 21 2 1CRN 63ORIGX1 1.000000 0.000000 0.000000 0.00000 1CRN 64ORIGX2 0.000000 1.000000 0.000000 0.00000 1CRN 65ORIGX3 0.000000 0.000000 1.000000 0.00000 1CRN 66SCALE1 .024414 0.000000 -.000328 0.00000 1CRN 67SCALE2 0.000000 .053619 0.000000 0.00000 1CRN 68SCALE3 0.000000 0.000000 .044409 0.00000 1CRN 69 CRYST1, ORIGX1, ORIGX2, ORIGX3, SCALE1, SCALE2, SCALE3 : crystallographic parameters! Doctorado UAM Ana Rojas

  13. PDB RECORD (3) ATOM: one line for each atom with its unique name and its, x, y, z, coordinates ATOM 1 N THR 1 17.047 14.099 3.625 1.00 13.79 1CRN 70ATOM 2 CA THR 1 16.967 12.784 4.338 1.00 10.80 1CRN 71ATOM 3 C THR 1 15.685 12.755 5.133 1.00 9.19 1CRN 72ATOM 4 O THR 1 15.268 13.825 5.594 1.00 9.85 1CRN 73ATOM 5 CB THR 1 18.170 12.703 5.337 1.00 13.02 1CRN 74ATOM 6 OG1 THR 1 19.334 12.829 4.463 1.00 15.06 1CRN 75ATOM 7 CG2 THR 1 18.150 11.546 6.304 1.00 14.23 1CRN 76ATOM 8 N THR 2 15.115 11.555 5.265 1.00 7.81 1CRN 77ATOM 9 CA THR 2 13.856 11.469 6.066 1.00 8.31 1CRN 78ATOM 10 C THR 2 14.164 10.785 7.379 1.00 5.80 1CRN 79ATOM 11 O THR 2 14.993 9.862 7.443 1.00 6.94 1CRN 80ATOM 12 CB THR 2 12.732 10.711 5.261 1.00 10.32 1CRN 81ATOM 13 OG1 THR 2 13.308 9.439 4.926 1.00 12.81 1CRN 82ATOM 14 CG2 THR 2 12.484 11.442 3.895 1.00 11.90 1CRN 83ATOM 15 N CYS 3 13.488 11.241 8.417 1.00 5.24 1CRN 84ATOM 16 CA CYS 3 13.660 10.707 9.787 1.00 5.39 1CRN 85 The TERM record terminates the amino acid chain ATOM 324 CG ASN 46 12.538 4.304 14.922 1.00 7.98 1CRN 393ATOM 325 OD1 ASN 46 11.982 4.849 15.886 1.00 11.00 1CRN 394ATOM 326 ND2 ASN 46 13.407 3.298 15.015 1.00 10.32 1CRN 395ATOM 327 OXT ASN 46 12.703 4.973 10.746 1.00 7.86 1CRN 396TER 328 ASN 46 1CRN 397 Doctorado UAM Ana Rojas

  14. Viewers HEADER LIGAND BINDING PROTEIN 02-MAR-00 1EJE TITLE CRYSTAL STRUCTURE OF AN FMN-BINDING PROTEIN COMPND MOL_ID: 1; COMPND 2 MOLECULE: FMN-BINDING PROTEIN; COMPND 3 CHAIN: A; COMPND 4 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: METHANOBACTERIUM THERMOAUTOTROPHICUM; SOURCE 5 EXPRESSION_SYSTEM_PLASMID: PET15B KEYWDS FMN-BINDING PROTEIN, STRUCTURAL GENOMICS EXPDTA X-RAY DIFFRACTION AUTHOR D.CHRISTENDAT,V.SARIDAKIS,A.BOCHKAREV,C.ARROWSMITH, AUTHOR 2 A.M.EDWARDS REVDAT 2 15-AUG-01 1EJE 1 HEADER KEYWDS REVDAT 1 11-OCT-00 1EJE 0 JRNL AUTH D.CHRISTENDAT,A.YEE,A.DHARAMSI,Y.KLUGER, JRNL AUTH 2 A.SAVCHENKO,J.R.CORT,V.BOOTH,C.D.MACKERETH, JRNL AUTH 3 V.SARIDAKIS,I.EKIEL,G.KOZLOV,K.L.MAXWELL,N.WU, JRNL AUTH 4 L.P.MCINTOSH,K.GEHRING,M.A.KENNEDY,A.R.DAVIDSON, JRNL AUTH 5 E.F.PAI,M.GERSTEIN,A.M.EDWARDS,C.H.ARROWSMITH JRNL TITL STRUCTURAL PROTEOMICS OF AN ARCHAEON JRNL REF NAT.STRUCT.BIOL. V. 7 903 2000 JRNL REFN ASTM NSBIEW US ISSN 1072-8368 REMARK 1 REMARK 2 REMARK 2 RESOLUTION. 2.2 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. ATOM 1 N GLY A 1 54.915 15.553 3.252 1.00 26.12 N ATOM 2 CA GLY A 1 54.219 16.804 3.668 1.00 23.30 C ATOM 3 C GLY A 1 54.870 18.009 3.019 1.00 25.07 C ATOM 4 O GLY A 1 55.848 17.853 2.295 1.00 26.88 O ATOM 5 N SER A 2 54.330 19.202 3.252 1.00 22.48 N ATOM 6 CA SER A 2 54.918 20.404 2.680 1.00 25.55 C ATOM 7 C SER A 2 56.202 20.683 3.460 1.00 26.03 C ATOM 8 O SER A 2 56.308 20.321 4.632 1.00 22.51 O ATOM 9 CB SER A 2 53.973 21.594 2.828 1.00 27.30 C …. etc etc TODAY: Programs that take coordinate files (mostly PDB-format) and process those to a graphical display output Some have stereo view options and support animation Doctorado UAM Ana Rojas

  15. Main 3D structure based databases I PDB SCOP :mostly manual, uses CE structure similarity program to decide whether two structures are similar. DALI: uses structure similarity program FSSP CATH:uses structure similarity program SSAP Doctorado UAM Ana Rojas

  16. The rate of new sequences is growing exponentially relative to the rate of protein structures being solved! Doctorado UAM Ana Rojas

  17. WHERE ARE THE STRUCTURES ??? Doctorado UAM Ana Rojas

  18. How could we fill the gap between the number of known sequences and known structures? Structural Genomics Initiative: JCSG Doctorado UAM Ana Rojas

  19. Doctorado UAM Ana Rojas

  20. Doctorado UAM Ana Rojas

  21. Doctorado UAM Ana Rojas

  22. Structural Genomics Initiative: JCSG Predicting Methods Gaeta, Italy How could we fill the gap between the number of known sequences and known structures? or Doctorado UAM Ana Rojas

  23. Structural prediction flowchart Doctorado UAM Ana Rojas Russell:http://speedy.embl-heidelberg.de/gtsp/flowchart2.html

  24. Relationship between sequence and structural similarity Chotia & Lesk, 1986 } %id seq. => same 3D (for sure) %id seq. => sometimes same str. sometimes not depends on the length of the aligned region. Doctorado UAM Ana Rojas

  25. DATABASE SEARCHING NO STRUCTURE HOMOLOG YES HOMOLOGY MODELING SECONDARY STRUCTURE PREDICTION FOLD PREDICTION “THREADING” FINAL STRUCTURE??? SIMPLIFIED PROTEIN STRUCTURE PREDICTION FLOW CHART EXPERIMENTAL SEQUENCE Doctorado UAM Ana Rojas

  26. WHY HOMOLOGY MODELING? Useful to infer function Atargetsequence (without a known structure) * Looking for atemplate Structure changes less than sequence in evolution! Comparative modeling can generate models with <2A r.s.m.d Doctorado UAM Ana Rojas

  27. Sometimes it’s not so easy to find atemplate… Doctorado UAM Ana Rojas

  28. …or to make a good alignment… Template Target ? Template Target A GOOD ALIGNMENT IS THE CRITICAL STEP! Doctorado UAM Ana Rojas

  29. SOME HISTORY First model : LACTALBUMIN. TEMPLATE: lysozyme. (structure will come in 1989) 1990’s expansion of modeling Nowadays: if >40% of seq. identity it is possible to make models comparable to X-ray level low resolution!. How many sequences can be modeled? Mostly up to a quarter of all available sequences! Doctorado UAM Ana Rojas

  30. MODELLING STEPS 1.- identify a suitable structural template 2.-Align and select the templates for modeling N iterations to improve the model 3.- Build the model 4.- Evaluation of the model Doctorado UAM Ana Rojas

  31. STEP 1: IDENTIFYING THE STRUCTURAL TEMPLATE Database searches using BLAST or similar algorithms (more than one template is recommended) When similarity is between 25-30% identity additional detections methods are required Doctorado UAM Ana Rojas

  32. STEP2: ALIGNMENT Sequence search methods are biased towards seq. evolution therefore are not always optimal for modeling purposes Template Target ? Template Target A GOOD ALIGNMENT IS THE CRITICAL STEP! Doctorado UAM Ana Rojas

  33. STEP2: ALIGNMENT (I) another example……. DVSHCIQETVESVGF---------NVIRDYV DVGEAIQEVMESYEVEIDNVIYQVKPIRNLN DVSHCIQETVESVGF---NVI------RDYV DVGEAIQEVMESYEVEIDNVIYQVKPIRNLN if looks good in structure it should be like: Doctorado UAM Ana Rojas

  34. STEP2: ALIGNMENT (II) ? PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS TEMPLATE PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS TARGET (ALIGNMENT 1) PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS TARGET (ALIGNMENT 2) "Alignment 1" is chosen because of the PROs at position 7. But 10 Angstrom gap is too big to close. Doctorado UAM Ana Rojas

  35. STEP 3: MODEL BUILDING • RIGID BODY ASSEMBLY: Align template structures and create a “consensus” frame (average of Ca in core regions) Fit the query seq into this frame. Needs high sequence similarities Caveats: with dissimilar sequences models are usually wrong (espe. deletion and insertion regions) Doctorado UAM Ana Rojas

  36. STEP 3: MODEL BUILDING (I) • SEGMENT MATCHING: Calculates conservation of positions in templates. Then calculates coordinates based on those. • SATISFACTION OF SPATIAL RESTRAINS: Satisfies spatial restrains between templates and query using: distance geometry optimisation Doctorado UAM Ana Rojas

  37. STEP 3: MODEL BUILDING (II) Backbone generation EASY Gap filling: if <3 residues is easy to fix (this size allows few configurations) Canonical Loop generation: common loops, can be modeled form libraries. Side Chain generation Ab Initio loop building Model optimisation Doctorado UAM Ana Rojas

  38. STEP 3: MODEL BUILDING (III) WHAT ABOUT THE SIDE CHAINS? Difficult! Several possible conformations! Those are restricted to certain rotamers What is known about rotamers? Side chain rotamers of conserved residues are themselves conserved Side chain replacements then focus on non-conserved regions There are extensive databases and libraries of rotamers Doctorado UAM Ana Rojas

  39. STEP 3: MODEL BUILDING (IV) HOW TO MODEL ROTAMERS? Caveats: when many side chains need to be replaced… how can I chose the first? Take a model and don’t use the conserved regions (no Gly or PRO) and replace others with Alanine. Side chains are then replaced in order of decreasing rotameric entropy Residues with very narrow rotamer distribution are built first. Otherwise are replaced when there are less degrees of freedom! Doctorado UAM Ana Rojas

  40. ASP32 LYS64 GLU67 GLU71 GLU70 LYS76 STEP 4: MODEL EVALUATION This step is crucial in the whole process! Several programs evaluate the models: Solv_Pref: Computes solvent exposure of the model. Negative values indicate structural stability ProSA (Sippl): Based on potentials extracted from databases. Good models have low energies http://www1.jcsg.org/psqs/psqs.cgi) PSQS: An energy like measurement. It is calculated on the statistical potentials of mean force describing interactions between residue pairs and between single residues and solvent. Values approaching -0.2 are ok. Doctorado UAM Ana Rojas

  41. STEP 4: MODEL EVALUATION (I) Energy-like measure Some structural features of proteins are overrepresented or underrepresented in known protein structures: • Contacts between amino acids • Burial status of amino acid • Secondary structure Doctorado UAM Ana Rojas

  42. STEP 4: MODEL EVALUATION (III) Evaluate models built on alternative alignments with energy-like measures?? Modeling program Good or bad? • 0.212 [d. a. f. u.] • GOOD MODEL! * Doctorado UAM Ana Rojas

  43. STEP 4: MODEL EVALUATION (IV) Energy-like measure again An energy like measure is based on statistics and, in fact, gives a hint if your structure is similar to A typical protein structure or not. In the previous example: Value of -0.212 is OK if the average in PDB is -0.278. But it’s still statistics…. Doctorado UAM Ana Rojas

  44. … HOWEVER Backbones might have conformational changes: see below backbone bending 2bb2 and 1amm Doctorado UAM Ana Rojas

  45. PDB is a mess ! PDB files have some missing atoms, unsolved parts of structures, do not start from AA 1, several atoms with the same number, several structures with the same chain ID, are not consecutively numbered... For example, to automate things in the case of Modeller, it is needed to have correct a alignment... otherwise: “Alignment sequence not found in PDB file” … AND Doctorado UAM Ana Rojas

  46. HOMOLOGY MODELING SERVERS SWISS-MODEL - www.expasy.ch/swissmod/SWISS-MODEL.html An automated comparative modelling server (ExPASy, CH) CPHmodels - www.cbs.dtu.dk/services/CPHmodels/ Server using homology modelling (BioCentrum, Denmark) SDSC1 - cl.sdsc.edu/hm.html Protein structure homology modeling server (San Diego, USA) 3D-JIGSAW - www.bmm.icnet.uk/servers/3djigsaw/ Automated system for 3D models for proteins (Cancer Research UK) WHATIF - www.cmbi.kun.nl/gv/servers/WIWWWI/ WHAT IF Web interface: homology modelling, drug docking, electrostatics calculations, structure validation and visualisation. Doctorado UAM Ana Rojas

  47. HOMOLOGY MODELING SERVERS BIOTECH Validation Suite - biotech.ebi.ac.uk:8400/ An evaluation suite that uses three widely available validation programs (PROCHECK, PROVE and WHAT IF) Verify3D - www.doe-mbi.ucla.edu/Services/Verify_3D/ A tool designed to help in the refinement of crystallographic structures. It also provides a visual analysis of model quality. Loops Database - www.bmm.icnet.uk/loop/ A table of five protein loop classes. (Cancer Research UK) Doctorado UAM Ana Rojas

  48. DATABASE SEARCHING STRUCTURE HOMOLOG SECONDARY STRUCTURE PREDICTION HOMOLOGY MODELING FOLD PREDICTION “THREADING” SIMPLIFIED PROTEIN STRUCTURE PREDICTION FLOW CHART EXPERIMENTAL SEQUENCE NO YES FINAL STRUCTURE??? Doctorado UAM Ana Rojas

  49. Homology Modelling vs Fold Detection 25%: “twilightzone” 0 30 100 % seq. ID Fold Detection Homology Modelling Approach Target Sequence >= 30-50% ID with template Any Sequence?? Atomic Level Fold Level Model Quality The best method of determining 3D structure is to base the model you make on a known structure. If your sequence is sufficiently similar (>30-50% identity) you could generate an all atom model by homology modelling. Doctorado UAM Ana Rojas

  50. FOLD DETECTION BLAST, FASTA CAPABLE TO DETECT VERY DISTANT HOMOLOGY (WHEN SEQUENCE-BASED METHODS FAIL) THREADING eg. FFAS03 GenThreader FOLD RECOGNITION eg HMM Alignment of sequences to structures as in THREADER (Jones et al. 1992) Fold recognition: distant/no clear homology Doctorado UAM Ana Rojas

More Related