1 / 47

Protein Structure Prediction

Protein Structure Prediction. Xiaole Shirley Liu And Jun Liu STAT115. Protein Structure Prediction Ram Samudrala University of Washington. Outline. Motivations and introduction Protein 2 nd structure prediction Protein 3D structure prediction CASP Homology modeling Fold recognition

Download Presentation

Protein Structure Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115

  2. Protein Structure Prediction Ram Samudrala University of Washington

  3. Outline • Motivations and introduction • Protein 2nd structure prediction • Protein 3D structure prediction • CASP • Homology modeling • Fold recognition • ab initio prediction • Manual vs automation • Structural genomics STAT115

  4. Sequence determines structure, structure determines function Most proteins can fold by itself very quickly Folded structure: lowest energy state Protein Structure STAT115

  5. Protein Structure • Main forces for considerations • Steric complementarity • Secondary structure preferences (satisfy H bonds) • Hydrophobic/polar patterning • Electrostatics

  6. Rationale for understanding protein structure and function structure determination structure prediction Protein structure - three dimensional - complicated - mediates function homology rational mutagenesis biochemical analysis model studies Protein sequence -large numbers of sequences, including whole genomes ? Protein function - rational drug design and treatment of disease - protein and genetic engineering - build networks to model cellular pathways - study organismal function and evolution

  7. Protein Databases • SwissProt: protein knowledgebase • PDB: Protein Data Bank, 3D structure STAT115

  8. View Protein Structure • Free interactive viewers • Download 3D coordinate file from PDB • Quick and dirty: • VRML • Rasmol • Chime • More powerful • Swiss-PdbViewer

  9. Compare Protein Structures • Structure is more conserved than sequence • Why compare? • Detect evolutionary relationships • Identify recurring structural motifs • Predicting function based on structure • Assess predicted structures • Protein structure comparison and classification • Manual: SCOP • Automated: DALI

  10. 3.6 Å 2.9 Å NK-lysin (1nkl) Bacteriocin T102/as48 (1e68) T102 best model Compare protein structures • Need ways to determine if two protein structures are related and to compare predicted • models to experimental structures • Commonly used measure is the root mean square deviation (RMSD)of the Cartesian • atoms between two structures after optimalsuperposition (McLachlan, 1979): • Usually use Caatoms • Other measures include contact maps and torsion angle RMSDs

  11. SCOP • Compare protein structure, identify recurring structural motifs, predict function • A. Murzin et al, 1995 • Manual classification • A few folds are highly populated • 5 folds contain 20% of all homologous superfamilies • Some folds are multifunctional STAT115

  12. Determine Protein Structure • X-ray crystallography (gold standard) • Grow crystals, rate limiting, relies on the repeating structure of a crystalline lattice • Collect a diffraction pattern • Map to real space electron density, build and refine structural model • Painstaking and time consuming STAT115

  13. Protein Structure Prediction • Since AA sequence determines structure, can we predict protein structure from its AA sequence? = predicting the three angles, unlimited DoF! • Physical properties that determine fold • Rigidity of the protein backbone • Interactions among amino acids, including • Electrostatic interactions • van der Waals forces • Volume constraints • Hydrogen, disulfide bonds • Interactions of amino acids with water STAT115

  14. Protein folding landscape Large multi-dimensional space of changing conformations J=10-3 s unfolded barrier height molten globule DG* native J=10-8 s free energy folding reaction

  15. Protein primary structure twenty types of amino acids two amino acids join by forming a peptide bond R R H O H C H H N OH Cα Cα Cα OH N C N C H H O O H H R each residue in the amino acid main chain has two degrees of freedom (f and y) R R H O H O H H c c y f y f C N C f N f Cα Cα Cα Cα N C N C y y c c H H O H O H R R the amino acid side chains can have up to four degrees of freedom (c1-4)

  16. 2nd Structure Prediction •  helix,  sheet, turn/loop STAT115

  17. 2nd Structure Prediction • Chou-Fasman 1974 • Base on 15 proteins (2473 AAs) of known conformation, determine P, Pfrom  0.5-1.5 • Empirical rules for 2nd struct nucleation • 4 H or h out of 6 AA, extends to both dir, P > 1.03, P > P, no  breakers • 3 H or h out of 5 AA, extends to both dir, P > 1.05, P > P, no  breakers • Have ~50-60% accuracy STAT115

  18. P and P STAT115

  19. 2nd Structure Prediction • Garnier, Osguthorpe, Robson, 1978 • Assumption: each AA influenced by flanking positions • GOR scoring tables (problem: limited dataset) • Add scores, assign 2nd with highest score STAT115

  20. 2nd Structure Prediction • D. Eisenberg, 1986 • Plot hydrophobicity as function of sequence position, look for periodic repeats • Period = 3-4 AA,  (3.6 aa / turn) • Period = 2 AA,  sheet • Best overall JPRED by Geoffrey Barton, use many different approaches, get consensus • Overall accuracy: 72.9% STAT115

  21. 3D Protein Structure Prediction • CASP contest: Critical Assessment of Structure Prediction • Biannual meeting since 1994 at Asilomar, CA • Experimentalists: before CASP, submit sequence of to-be-solved structure to central repository • Predictors: download sequence and minimal information, make predictions in three categories • Assessors: automatic programs and experts to evaluate predictions quality STAT115

  22. CASP Category I • Homology Modeling (sequences with high homology to sequences of known structure) • Given a sequence with homology > 25-30% with known structure in PDB, use known structure as starting point to create a model of the 3D structure of the sequence • Takes advantage of knowledge of a closely related protein. Use sequence alignment techniques to establish correspondences between known “template” and unknown. STAT115

  23. CASP Category II • Fold recognition (sequences with no sequence identity (<= 30%) to sequences of known structure • Given the sequence, and a set of folds observed in PDB, see if any of the sequences could adopt one of the known folds • Takes advantage of knowledge of existing structures, and principles by which they are stabilized (favorable interactions) STAT115

  24. CASP Category III • Ab initio prediction (no known homology with any sequence of known structure) • Given only the sequence, predict the 3D structure from “first principles”, based on energetic or statistical principles • Secondary structure prediction and multiple alignment techniques used to predict features of these molecules. Then, some method necessary for assembling 3D structure. STAT115

  25. Structure Prediction Evaluation • Hydrophobic core similar? • 2nd struct identified? • Energy: minimized? H-bond contacts? • Compare with solved crystal structure: gold standard STAT115

  26. Comparative modelling of protein structure scan align KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** … … build initial model construct non-conserved side chains and main chains refine

  27. Homology Modeling Results • When sequence homology is > 70%, high resolution models are possible (< 3 Å RMSD) • MODELLER (Sali et al) • Find homologous proteins with known structure and align • Collect distance distributions between atoms in known protein structures • Use these distributions to compute positions for equivalent atoms in alignment • Refine using energetics STAT115

  28. Homology Modeling Results • Many places can go wrong: • Bad template - it doesn’t have the same structure as the target after all • Bad alignment (a very common problem) • Good alignment to good template still gives wrong local structure • Bad loop construction • Bad side chain positioning STAT115

  29. Homology Modeling Results • Use of sensitive multiple alignment (e.g. PSI-BLAST) techniques helped get best alignments • Sophisticated energy minimization techniques do not dramatically improve upon initial guess STAT115

  30. Fold Recognition Results • Also called protein threading • Given new sequence and library of known folds, find best alignment of sequence to each fold, returned the most favorable one STAT115

  31. Fold Recognition with Dynamic Programming • Environmental class for each AA based on known folds (buried status, polarity, 2nd struct) STAT115

  32. Protein Folding with Dynamic Programming • D. Eisenburg 1994 • Align sequence to each fold (a string of environmental classes) • Advantages: fast and works pretty well • Disadvantages: do not consider AA contacts STAT115

  33. Fold Recognition Results • Each predictor can submit N top hits • Every predictor does well on something • Common folds (more examples) are easier to recognize • Fold recognition was the surprise performer at CASP1. Incremental progress at CASP2, CASP3, CASP4… STAT115

  34. Fold Recognition Results • Alignment (seq to fold) is a big problem STAT115

  35. ab initio • Predict interresidue contacts and then compute structure (mild success) • Simplified energy term + reduced search space (phi/psi or lattice) (moderate success) • Creative ways to memorize sequence  structure correlations in short segments from the PDB, and use these to model new structures: ROSETTA STAT115

  36. select Ab initio prediction of protein structure sample conformational space such that native-like conformations are found hard to design functions that are not fooled by non-native conformations (“decoys”) astronomically large number of conformations 5 states/100 residues = 5100 = 1070

  37. Sampling conformational space – continuous approaches energy • Most work in the field • Molecular dynamics • Continuous energy minimization (follow a valley) • Monte Carlo simulation • Genetic Algorithms • Like real polypeptide folding process • Cannot be sure if native-like conformations are sampled

  38. Molecular dynamics • Force = -dU/dx (slope of potential U); acceleration, force =m ×a(t) • All atoms are moving so forces between atoms are complicated functions of time • Analytical solution for x(t) and v(t) is impossible; numerical solution is trivial • Atoms move for very short times of 10-15 seconds or 0.001 picoseconds (ps) • x(t+Dt) = x(t) + v(t)Dt + [4a(t) – a(t-Dt)] Dt2/6 • v(t+Dt) = v(t) + [2a(t+Dt)+5a(t)-a(t-Dt)] Dt/6 • Ukinetic = ½ Σ mivi(t)2 = ½ n KBT • Total energy (Upotential + Ukinetic) must not change with time acceleration old velocity old position new position new velocity n is number of coordinates (not atoms)

  39. starting conformation energy deep minimum number of steps Energy minimization • For a given protein, the energy depends on thousands of x,y,z Cartesian atomic coordinates; reaching a deep minimum is not trivial • Furthermore, we want to minimize the free energy, not just the potential energy.

  40. Monte Carlo Simulation • Propose moves in torsion or Cartesian conformation space • Evaluate energy after every move, compute E • Accept the new conformation based on • If run infinite time, the simulated conformation follows the Boltzmann distribution • Many variations, including simulated annealing and other heuristic approaches.

  41. Scoring/energy functions • Need a way to select native-like conformations from non-native ones • Physics-based functions: electrostatics, van der Waals, solvation, bond/angle terms. • Knowledge-based scoring functions: • Derive information about atomic properties from a database of experimentally determined conformations • Common parameters include pairwise atomic distances and amino acid burial/exposure.

  42. Rosetta • D. Baker, U. Wash • Break sequence into short segments (7-9 AA) • Sample 3D from library of known segment structures, parallel computation • Use simulated annealing (metropolis-type algorithm) for global optimization • Propose a change, if better energy, take; otherwise take at smaller probability • Create 1000 structures, cluster and choose one representative from each cluster to submit STAT115

  43. Manual Improvements and Automation • Very often manual examination could improve prediction • Catch errors • Need domain knowledge • A. Murzin’s success at CASP2 • CAFASP: Critical Assessment of Fully Automated Structure Prediction • Murzin Can’t play!! • MetaServers: combine different methods to get consensus STAT115

  44. CAFASP Evaluation STAT115

  45. Structural Genomics • With more and more solved structures and novel folds, computational protein structure prediction is going to improve • Structural genomics: • Worldwide initiative to high throughput determine many protein structures • Especially, solve structures that have no homology STAT115

  46. Summary • Protein structures: 1st, 2nd, 3rd, 4th • Different DB: SwissProt, PDB and SCOP • Determine structure: X-ray crystallography • Protein structure prediction: • 2nd structure prediction • Homology modeling • Fold recognition • Ab initio • Evaluation: energy, RMSD, etc • CASP and CAFASP contest • Manual improvement and combination of computational approaches work better • Structural Genomics, still very difficult problem… STAT115

  47. Acknowledgement • Amy Keating • Michael Yaffe • Mark Craven • Russ Altman STAT115

More Related