Prediction of protein structure - PowerPoint PPT Presentation

prediction of protein structure n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Prediction of protein structure PowerPoint Presentation
Download Presentation
Prediction of protein structure

play fullscreen
1 / 76
Prediction of protein structure
119 Views
Download Presentation
saxon
Download Presentation

Prediction of protein structure

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Prediction of protein structure

  2. aim • Structure prediction tries to build models of 3D structures of proteins that could be useful for understanding structure-function relationships.

  3. Genbank/EMBL 105.000.000 Uniprot 5.200.000 PDB 47.000

  4. DNA sequence Protein sequence Molecular recognition 3D structure

  5. The protein folding problem • The information for 3D structures is coded in the protein sequence • Proteins fold in their native structure in seconds • Native structures are both thermodynamically stables and kinetically available

  6. AVVTW...GTTWVR ab-initio prediction • Prediction from sequence using first principles

  7. Ab-initio prediction • “In theory”, we should be able to build native structures from first principles using sequence information and molecular dynamics simulations: “Ab-initio prediction of structure” • Simulaciones de 1 ms de “folding” de una proteína modelo (Duan-Kollman: Science, 277, 1793, 1998). • Simulaciones de folding reversible de péptidos (20-200 ns) (Daura et al., Angew. Chem., 38, 236, 1999). • Simulaciones distribuidas de folding de Villin (36-residues) (Zagrovic et al., JMB, 323, 927, 2002).

  8. ... the bad news ... • It is not possible to span simulations to the “seconds” range • Simulations are limited to small systems and fast folding/unfolding events in known structures • steered dynamics • biased molecular dynamics • Simplified systems

  9. typical shortcuts • Reduce conformational space • 1,2 atoms per residue • fixed lattices • Statistic force-fields obtained from known structures • Average distances between residues • Interactions • Use building blocks: 3-9 residues from PDB structures

  10. “lattice” folding

  11. Low stability Very stable Hydrophobic Cb-Cb Total http://lore.came.sbg.ac.at:8080/CAME/CAME_EXTERN/ProsaII/index_html Example PROSA potential

  12. Results from ab-initio • Average error 5 Å - 10 Å • Function cannot be predicted • Long simulations Some protein from E.coli predicted at 7.6 Å (CASP3, H.Scheraga)

  13. comparative modelling • The most efficient way to predict protein structure is to compare with known 3D structures

  14. Protein folds

  15. Basic concept • In a given protein 3D structure is a more conserved characteristic than sequence • Some aminoacids are “equivalent” to each other • Evolutionary pressure allows only aminoacids substitutions that keep 3D structure largely unaltered • Two proteins of “similar” sequences must have the “same” 3D structure

  16. Possible scenarios 1. Homology can be recognized using sequence comparison tools or protein family databases (blast, clustal, pfam,...). Structural and functional predictions are feasible 2. Homology exist but cannot be recognized easily (psi-blast, threading) Low resolution fold predictions are possible. No functional information. 3. No homology 1D predictions. Sequence motifs. Limited functional prediction. Ab-initio prediction

  17. fold prediction

  18. 3D struc. prediction

  19. AGGCFHIKLAAGIHLLVILVVKLGFSTRDEEASS Average over a window 1D prediction • Prediction is based on averaging aminoacid properties

  20. 1D prediction. Properties • Secondary structure propensitites • Hydrophobicity (transmembrane) • Accesibility • ...

  21. Propensities Chou-Fasman Biochemistry 17, 4277 1978 a b turn

  22. Some programs (www.expasy.org) • BCM PSSP - Baylor College of Medicine • Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction • GOR I (Garnier et al, 1978) [At PBIL or at SBDS] • GOR II (Gibrat et al, 1987) • GOR IV (Garnier et al, 1996) • HNN - Hierarchical Neural Network method (Guermeur, 1997) • Jpred - A consensus method for protein secondary structure prediction at University of Dundee • nnPredict - University of California at San Francisco (UCSF) • PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom, EvalSec from Columbia University • PSA - BioMolecular Engineering Research Center (BMERC) / Boston • PSIpred - Various protein structure prediction methods at Brunel University • SOPM (Geourjon and Deléage, 1994) • SOPMA (Geourjon and Deléage, 1995) • AGADIR - An algorithm to predict the helical content of peptides

  23. 1D Prediction • Original methods: 1 sequence and uniform parameters (25-30%) • Original improvements: Parameters specific from protein classes • Present methods use sequence profiles obtained from multiple alignments and neural networks to extract parameters (70-75%, 98% for transmembrane helix)

  24. PredictProtein (PHD) • Building of a multiple alignment using Swissprot, prosite, and domain databases • 1D prediction from the generated profile using neural networks • Fold recognition • Confidence evaluation

  25. PredictProteinAvailable information • Multiple alignments MaxHom • PROSITE motifs • SEG Composition-bias • Threading TOPITS • Secondary structure PHDSec PROFsec • Transmembrane helices PHDhtm, PHDtop • Globularity GLOBE • Coiled-coil COILS • Disulfide bridges CYSPRED Result

  26. PredictProteinAvailable information • Signal peptides SignalP • O-glycosilation NetOglyc • Chloroplast import signal CloroP • Consensus secondary struc. JPRED • Transmembrane TMHMM, TOPPRED • SwissModel

  27. Methods for remote homology • Homology can be recognized using PSI-Blast • Fold prediction is possible using threading methods • Acurate 3D prediction is not possible: No structure-function relationship can be inferred from models

  28. Threading • Unknown sequence is “folded” in a number of known structures • Scoring functions evaluate the fitting between sequence and structure according to statistical functions and sequence comparison

  29. SELECTED HIT ATTWV....PRKSCT .......... 10.5 > .......... 5.2

  30. ATTWV....PRKSCT SequenceHHHHH....CCBBBB Pred. Sec. Struc.eeebb....eeebeb Pred. accesibility .......... Sequence GGTV....ATTW ........... ATTVL....FFRK Obs SS BBBB....CCHH ........... HHHB.....CBCB Obs Acc. EEBE.....BBEB ........... BBEBB....EBBE

  31. Threading accurancy

  32. Comparative modelling • Good for homology >30% • Accurancy is very high for homology > 60% • Remainder • The model must be USEFUL • Only the “interesting” regions of the protein need to be modelled

  33. Expected accurancy • Strongly dependent on the quality of the sequence alignment • Strongly dependent on the identity with “template” structures. Very good structures if identity > 60-70%. • Quality of the model is better in the backbone than side chains • Quality of the model is better in conserved regions

  34. Steps • Choose templates: Proteins with experimental 3D structure with significant homology (BLAST, PFAM, PDB) • Building multiple alignment of templates. • Alignment quality is critical for accurancy. Always use structure-based alignment. • Reduce redundancies

  35. Template alignment

  36. Steps • Alignment of template structures • Alignment of unknown sequence against template alignment • Structural alignment may not concide with evolution-based alignment. • Gaps must be chosen to minimize structure distortion

  37. PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS (green) PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS (red) PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS (blue)

  38. Steps • Alignment of template structures • Alignment of unknown sequence against template alignment • Build structure of conserved regions (SCR) • Coordinates come from either a single structure or averages. • Side chains are adapted to the original or placed in standard conformations

  39. Steps • Alignment of template structures • Alignment of unknown sequence against template alignment • Build structure of conserved regions (SCR) • Build of unconserved regions (“loops” usually)

  40. “loops” Ab initio PDB