1 / 24

Protein structure prediction

Protein data bank (PDB) : 46818 structures (oct 2007) SCOP (Structural Classification Of Proteins): • 971 folds (major structural similarity) • 1586 super-families (probable common evolutionary origin) • 3004 families (clear evolutionary relationship, ~ 30% identity).

nell-banks
Download Presentation

Protein structure prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein data bank (PDB) : 46818 structures (oct 2007)SCOP (Structural Classification Of Proteins): • 971 folds (major structural similarity)• 1586 super-families (probable common evolutionary origin)• 3004 families (clear evolutionary relationship, ~ 30% identity) Protein structure prediction Nearly all folds are known (?) But 5 millions known protein sequences (trEMBL)  -> needs for structure prediction

  2. Usually, structure-activity relationships : site-directed mutagenesis, pharmacologic studies, drug design,… • But also: • • genomic studies : recognizing orphan genes • • distant evolution studies Structure prediction: what for ? • Sequences diverge more than structures

  3. Methods for protein structural studies Known structures : Simulations at the atom level: molecular modelling (enthalpic energy) / molecular dynamics /normal modes

  4. Methods for protein structural studies Unknown structures : Before using molecular mechanics, one must have a « realistic » structure. 3D structure prediction : 1) homology modelling 2) ab initio folding 3) threading

  5. Homology modelling Needs to know a 3D structure that is homolog to the query sequence AGLNVIAGSILQNS GGINVLAASLLNNS e.g.: Modeller web server (http://www.salilab.org/modeller)

  6. Homology modelling AGLNVIAGSILQNS GGINVLAASLLNNS e.g.: Modeller web server (http://www.salilab.org/modeller)

  7. Ab initio folding Protein Data Bank (PDB) AGVLVAGHM Target sequence: generation AGV AGV LVA GHM AGV LVA AGV GHM LVA AGV LVA GHM LVA . . . GHM Minimisation - energy evaluation Baker et al.

  8. Threading (1) Protein Data Bank (PDB) families

  9. Threading (1) -------------- family core + interactions family Protein Data Bank -> library of cores

  10. Threading (2) Protein Data Bank (PDB) Statistics for 3D neighboring residue pairs -> Energy A L = -1.2 A I = -2.2 ... Other characteristics: residue accessibility, secondary structure,…

  11. Threading (3) core --------------

  12. Threading (3) Thread the sequence onto the core V I = -2.3 L N = -4.2 L G = -5.1 GGINVLAGSLLNNS

  13. Threading (3) Thread the sequence onto the core N G = -1.3 V I = -2.2 S A = -4.2 AGGINVLAGSLLNN

  14. Threading (3) Thread the sequence onto the core I G = -3.3 N G = -3.0 G L = -2.1 LAGGINVLAGSLLN Compute energy for every alignment of the sequence onto the core (many alignments, gaps…) Thread the sequence onto all cores -> choose the best core (low energy)

  15. Threading Can be used when sequence tools (BLAST or PSIBLAST) cannot find simlarities Threading methods are under developments :- optimisation of 3D alignments- better core definition- statistical assessment for results

  16. Threading Robetta : http://robetta.bakerlab.org/3DPSSM : http://www.sbg.bio.ic.ac.uk/∼3dpssm/bioinbgu : http://www.cs.bgu.ac.il/∼bioinbgu/form.htmlGenTHREADER : http://bioinf.cs.ucl.ac.uk/psipred/psiform.htmlFROST :http://genome.jouy.inra.fr/frost/

  17. The end…

  18. A) La quantification des similarités des paires de structures (comparaison «~tout contre tout~») donne la position d'une structure dans un espace abstrait de hautes dimensions. La hauteur des pics reflète la densité de population de repliements, les axes horizontaux sont les axes des deux premiers vecteurs propres (i.e. associés aux deux plus grandes valeurs propres), l'axe vertical donne le nombre de repliements. La distribution des architectures est donnée par la projection sur le plan (la proximité sur ce plan donne une indication sur la similarité structurale entre 2 protéines) B) 40% de tous les domaines connus sont couverts par 16 classes de repliements. Ces 16 repliements sont montrés ici sous forme de diagrammes topologiques de structures secondaires dans la classe de leur attracteur (le numéro d'attracteur est le même que dans la figure A). Figures tirées de Holm et Sander (1996) "Mapping the protein universe"

  19. Threading: fonction d’évaluation

  20. Méthode d’alignement séquence/structure

  21. Méthode d’alignement séquence/structure (2)

  22. Normalisation des scores

More Related