1 / 29

PhDcoursehomology..

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU ... CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU ...

Download Presentation

PhDcoursehomology..

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Homology Modeling

    Anne Mølgaard, CBS, BioCentrum, DTU

    2. Why can we do it?

    The structure of a protein is uniquely determined by its amino acid sequence (but sequence is sometimes not enough): prions pH, ions, cofactors, chaperones Structure is conserved much longer than sequence in evolution

    3. How often can we do it?

    There are currently ~30000 structures in the PDB (but only ~4000 if you include only ones that are not more than 30% identical and have a resolution better than 3.0 Å) An estimate says that ~50% of all sequences have a structurally characterized homolog

    4. Worldwide Structural Genomics

    ”Fold space coverage” Complete genomes Signaling proteins Improving technology Disease-causing organisms Model organisms Membrane proteins Protein-ligand interactions

    5. Structural Genomics in North America

    10 year $600 million project initiated in 2000, funded largely by NIH AIM: structural information on 10000 unique proteins (now 4-6000), so far 1000 have been determined Improve current techniques to reduce time (from months to days) and cost (from $100.000 to $20.000/structure) 9 research centers currently funded (2005), targets are from model and disease-causing organisms (a separate project on TB proteins)

    6. Homology modeling for structural genomics

    Roberto Sánchez et al. Nature Structural Biology 7, 986 - 990 (2000)

    7. How well can we do it?

    Sali, A. & Kuriyan, J. Trends Biochem. Sci. 22, M20–M24 (1999) 

    9. How can we do it?

    Identify template(s) – initial alignment Improve alignment Backbone generation Loop modeling Side chains Refinement Validation

    Template identification Search with sequence Blast Psi-Blast Fold recognition methods Use biological information Functional annotation in databases Active site/motifs

    11. Template quality

    Selecting the best template is crucial! The best template may not be the one with the highest % id (best p-value…) Template 1: 93% id, 3.5 Å resolution ? Template 2: 90% id, 1.5 Å resolution ?

    4.0 Å 3.0 Å 1.8 Å 1.0 Å high low Molecules secondary structure elements residues atoms

    13. Template quality – Ramachandran plot

    14. Improving the alignment

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS From ”Professional Gambling” by Gert Vriend http://www.cmbi.kun.nl/gv/articles/text/gambling.html

    15. Backbone generation

    Generate the backbone coordinates from the template for the aligned regions Several programs can do this, most of the groups at CASP6 use Modeller: http://salilab.org/modeller/modeller.html

    16. Loop modeling

    Knowledge based: searches PDB for fragments that match the sequence to be modeled (Levitt, Holm, Baker etc.) Energy based: uses an energy function to evaluate the quality of the loop and minimizes this function by Monte Carlo or MD techniques Combination

    17. Loops – the Rosetta method

    Find fragments (10 per amino acid) with the same sequence and secondary structure profile as the query sequence Combine them using a Monte Carlo scheme to build the loop Baker et al

    18. Side chains

    Side chain rotamers are dependent on backbone conformation Most successful method in CASP6 was SCWRL by Dunbrack et al: uses a graph-theory knowledge based method to solve the combinatorial problem of side chain modeling http://dunbrack.fccc.edu/SCWRL3.php

    19. Side chains

    Prediction accuracy is high for buried residues, but much lower for surface residues Experimental reasons: side chains at the surface are more flexible Theoretical reasons: much easier to handle hydrophobic packing in the core than the electrostatic interactions, including H-bonds to waters

    20. Side chains

    If the seq. id is high, the networks of side chain contacts may be conserved, and keeping the side chain rotamers from the template may be better than predicting new ones

    21. Refinement

    Energy minimization Molecular dynamics Big errors like atom clashes can be removed, but force fields are not perfect and small errors will also be introduced – keep minimization to a minimum or matters will only get worse

    22. Error recovery

    If errors are introduced in the model, they can normally not be recovered at a later step The alignment can not make up for a bad choice of template Loop modeling can not make up for a poor alignment If errors are discovered the step when they were introduced should be redone

    23. Validation

    Most programs will get the bond lengths and angles right The Ramachandran plot of the model usually looks pretty much like the Ramachandran plot of the template (so select a high quality template) Inside/outside distributions of polar and apolar residues can be useful

    24. Validation – ProQ server

    ProQ is a neural network based predictor that based on a number of structural features predicts the quality of a protein model ProQ is optimized to find correct models in contrast to other methods which are optimized to find native structures Arne Elofssons group: http://www.sbc.su.se/~bjorn/ProQ/

    25. Structure validation

    ProCheck http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html WhatIf server http://swift.cmbi.kun.nl/WIWWWI//

    26. Homology modeling servers

    Eva-CM performs continous and automated analysis of comparative protein structure modeling servers A current list of the best performing servers can be found at: http://cubic.bioc.columbia.edu/eva/doc/intro_cm.html

    27. CASP6 results

    28. The top 4 homology modeling groups in CASP6

    Alfonso Valencia, CASP6 Homology modeling assessment Dunbrack, Wang & Jin (2004) CASP6 Fold Recognition Assessment The hardest target i CASP6, 8% id
More Related