1 / 42

Protein Tertiary Structure Prediction

Protein Tertiary Structure Prediction. Structural Bioinformatics. The Different levels of Protein Structure. Primary: amino acid linear sequence. Secondary:  -helices, β -sheets and loops. Tertiary : the 3D shape of the fully folded polypeptide chain.

marlee
Download Presentation

Protein Tertiary Structure Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Tertiary Structure Prediction Structural Bioinformatics

  2. The Different levels of Protein Structure Primary: amino acid linear sequence. Secondary: -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded polypeptide chain

  3. How can we view the protein structure ? • Download the coordinates of the structure from the PDB http://www.rcsb.org/pdb/ • Launch a 3D viewer program For example we will use the program Pymol The program can be downloaded freely from the Pymol homepage http://pymol.sourceforge.net/ • Upload the coordinates to the viewer

  4. Pymol example • Launch Pymol • Open file “1aqb” (PDB coordinate file) • Display sequence • Hide everything • Show main chain / hide main chain • Show cartoon • Color by ss • Color red • Color green, resi 1:40 Help http://pymol.sourceforge.net/newman/user/toc.html

  5. Predicting 3D Structure Outstanding difficult problem Based on sequence homology • Comparative modeling (homology) Based on structural homology • Fold recognition (threading)

  6. Based on Sequence homology Comparative Modeling Similar sequences suggests similar structure

  7. Sequence and Structure alignments of two Retinol Binding Protein

  8. Structure Alignments There are many different algorithms for structural Alignment. The outputs of a structural alignment are a superposition of the atomic coordinates and a minimal Root Mean Square Distance (RMSD) between the structures. The RMSD of two aligned structures indicates their divergence from one another. Low values of RMSD mean similar structures

  9. Dali (Distance mAtrix aLIgnment) DALI offers pairwise alignments of protein structures. The algorithm uses the three-dimensional coordinates of each protein to calculate distance matrices comparing residues. See Holm L and Sander C (1993) J. Mol. Biol. 233:123-138. SALIGN http://salilab.org/DBALI/?page=tools

  10. Fold classification based on structure-structure alignment of proteins (FSSP) FSSP is based on a comprehensive comparison of PDB proteins (greater than 30 amino acids in length) using DALI. Representative sets exclude sequence homologs sharing > 25% amino acid identity. http://www.ebi.ac.uk/dali/fssp Page 293

  11. Based on Sequence homology Comparative Modeling Similar sequence suggests similar structure Comparative structure prediction produces an all atom model of a sequence, based on its alignment to one or more related protein structures in the database

  12. Based on Sequence homology Comparative Modeling • Accuracy of the comparative model is related to the sequence identity on which it is based >50% sequence identity = high accuracy 30%-50% sequence identity= 90% modeled <30% sequence identity =low accuracy (many errors)

  13. Homology Threshold for Different Alignment Lengths Homology Threshold(t) Alignment length (L) A sequence alignment between two proteins is considered to imply structural homology if the sequence identity is equal to or above the homology threshold t in a sequence region of a given length L. The threshold values t(L) are derived from PDB

  14. Comparative Modeling • Similarity particularly high in core • Alpha helices and beta sheets preserved • Even near-identical sequences vary in loops

  15. Based on Sequence homology Comparative Modeling Methods MODELLER (Sali –Rockefeller/UCSF) SCWRL (Dunbrack- UCSF ) SWISS-MODEL http://swissmodel.expasy.org//SWISS-MODEL.html

  16. Based on Sequence homology Comparative Modeling Modeling of a sequence based on known structures Consist of four major steps : • Finding a known structure(s) related to the sequence to be modeled (template), using sequence comparison methods such as PSI-BLAST 2. Aligning sequence with the templates 3. Building a model 4. Assessing the model

  17. Based on Structure homology Fold Recognition

  18. Based on Structure homology Protein Folds • A combination of secondary structural units • Forms basic level of classification • Each protein family belongs to a fold • Different sequences can share similar folds

  19. Based on Secondary Structure Protein Folds: sequential and spatial arrangement of secondary structures Hemoglobin TIM

  20. Based on Structure homology Protein Folds • A combination of secondary structural units • Forms basic level of classification • Each protein family belongs to a fold • Different sequences can share similar folds

  21. Similar folds usually mean similar function Transcription factors Homeodomain

  22. Based on Structure homology Protein Folds • A combination of secondary structural units • Forms basic level of classification • Each protein family belongs to a fold • Different sequences can share similar folds

  23. The same fold can have multiple functions Rossmann 12 functions 31 functions TIM barrel

  24. SCOP Structure Classification Of Proteins • Fold classification: • Class: • All alpha • All beta • Alpha/beta • Alpha+beta • Fold • Superfamily • Family

  25. Retinol Binding Protein

  26. Based on Structure homology Fold Recognition • Methods of protein fold recognition attempt to detect similarities between protein 3D structure that have no significant sequence similarity. • Search for folds that are compatible with a particular sequence. • "the turn the protein folding problem on it's head” rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence

  27. Based on Structure homology Basic steps in Fold Recognition : Compare sequence against a Library of all known Protein Folds (finite number) Query sequence MTYGFRIPLNCERWGHKLSTVILKRP... Goal: find to what folding template the sequence fits best There are different ways toevaluate sequence-structure fit

  28. Potential fold Based on Secondary Structure homology There are different ways toevaluate sequence-structure fit 1) ... 56) ... n) ... ... -10 ... -123 ... 20.5 MAHFPGFGQSLLFGYPVYVFGD...

  29. Based on Secondary Structure homology Programs for fold recognition • TOPITS (Rost 1995) • GenTHREADER (Jones 1999) • SAMT02 (UCSC HMM) • 3D-PSSMhttp://www.sbg.bio.ic.ac.uk/~3dpssm/

  30. Ab Initio Modeling • Compute molecular structure from laws of physics and chemistry alone Theoretically Ideal solution Practically nearly impossible WHY ? • Exceptionally complex calculations • Biophysics understanding incomplete

  31. Ab Initio Methods • Rosetta (Bakers lab, Seattle) • Undertaker (Karplus, UCSC)

  32. CASP - Critical Assessment of Structure Prediction • Competition among different groups for resolving the 3D structure of proteins that are about to be solved experimentally. • Current state - • ab-initio - the worst, but greatly improved in the last years. • Modeling - performs very well when homologous sequences with known structures exist. • Fold recognition - performs well.

  33. What’s Next Predicting function from structure

  34. Structural Genomics: a large scale structure determination project designed to cover all representative protein structures ATP binding domain of protein MJ0577 Zarembinski, et al., Proc.Nat.Acad.Sci.USA, 99:15189 (1998)

  35. ~300 unique folds in PDB ~800 unique folds Currently

  36. ~1000- 3000 unique folds Estimated in “structure space”

  37. Structure Genomics expectations ~ 5 proteins to characterize the sequence space corresponding to 1 fold ~10000-15000 new structures expected

  38. Wanted ! Automated methodsto predict function from the protein structures resulting from the structural genomic project. As a result of the Structure Genomic initiative many structures of proteins with unknown function will be solved

  39. Approaches for predicting function from structure ConSurf - Mapping the evolution conservation on the protein structure http://consurf.tau.ac.il/

  40. Approaches for predicting function from structure PHPlus – Identifying positive electrostatic patches on the protein structure http://pfp.technion.ac.il/

  41. Approaches for predicting function from structure SHARP2 – Identifying positive electrostatic patches on the protein structure http://www.bioinformatics.sussex.ac.uk/SHARP2

More Related