1 / 41

Protein Structure, Structure Classification and Prediction

Protein Structure, Structure Classification and Prediction. Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala University. Overview. Introduction to proteins, structure & classification Protein Folding

heidi
Download Presentation

Protein Structure, Structure Classification and Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala University

  2. Overview • Introduction to proteins, structure & classification • Protein Folding • Experimental techniques for structure determination • Structure prediction

  3. Proteins • Proteins play a crucial role in virtually all biological processes with a broad range of functions. • The activity of an enzyme or the function of a protein is governed by the three-dimensional structure

  4. 20 amino acids - the building blocks

  5. The Amino Acids

  6. Hydrophilic or hydrophobic..? • Virtually all soluble proteins feature a hydrophobic core surrounded by a hydrophilic surface • But, peptide backbone is inherently polar ? • Solution ; neutralize potential H-donors & acceptors using ordered secondary structure

  7. Secondary Structure:a-helix

  8. Secondary Structure:a-helix • 3.6 residues / turn • Axial dipole moment • Not Proline & Glycine • Protein surfaces

  9. Secondary Structure:b-sheets

  10. Secondary Structure:b-sheets • Parallel or antiparallel • Alternating side-chains • No mixing • Loops often have polar amino acids

  11. Structural classification • Databases • SCOP, ’Structural Classification of Proteins’, manual classification • CATH, ’Class Architecture Topology Homology’, based on the SSAP algorithm • FSSP, ’Family of Structurally Similar Proteins’, based on the DALI algorithm • PClass, ’Protein Classification’ based on the LOCK and 3Dsearch algorithms

  12. Structural classification, CATH • Class, four types : • Mainly a • a/ b structures • Mainly b • No secondary structure • Arhitecture (fold) • Topology (superfamily) • Homology (family)

  13. Structural classification..

  14. Structural classification.. • Two types of algorithms • Inter-Molecular, 3D, Rigid Body ; structural alignment in a common coordinate system (hard) e.g. VAST, LOCK.. alg. • Intra-Molecular, 2D, Internal Geometry ; structural alignment using internal distances and angles e.g. DALI,STRUCTURAL, SSAP.. alg.

  15. Structural classification, SSAP • SSAP, ‘Sequential Structure Alignment Program’ Basic idea ; The similarity between residue i in molecule A and residue k in molecule B is characterised in terms of their structural surroundings This similarity can be quantified into a score, Sik Based on this similarity score and some specified gap penalty, dynamic programming is used to find the optimal structural alignment

  16. Structural classification, SSAP The structural neighborhood of residue i in A compared to residue k in B i k

  17. Structural classification, SSAP.. Distance between residue i & j in molecule A ; dAi,j Similarity for two pairs of residues, ij in A & kl in B ; a,b constants Similarity between residue i in A and residue k in B ; Idea ; Si,k is big if the distances from residue i in A to the 2n nearest neighbours are similar to the corresponding distances around k in B

  18. i=5 HSERAHVFIM.. GQ-VMAC-NW.. A : B : k=4 Structural classification, SSAP.. This works well for small structures and local structural alignments - however, insertions and deletions cause problems  unrelated distances - The real algorithm uses Dynamic programming on two levels, first to find which distances to compare  Sik, then to align the structures using these scores

  19. Experimental techniques for structure determination • X-ray Crystallography • Nuclear Magnetic Resonance spectroscopy (NMR) • Electron Microscopy/Diffraction • Free electron lasers ?

  20. X-ray Crystallography

  21. X-ray Crystallography.. • From small molecules to viruses • Information about the positions of individual atoms • Limited information about dynamics • Requires crystals

  22. NMR • Limited to molecules up to ~50kDa (good quality up to 30 kDa) • Distances between pairs of hydrogen atoms • Lots of information about dynamics • Requires soluble, non-aggregating material • Assignment problem

  23. Electron Microscopy/ Diffraction • Low to medium resolution • Limited information about dynamics • Can use very small crystals (nm range) • Can be used for very large molecules and complexes

  24. Structure Prediction ? GPSRYIV…

  25. Protein Folding • Different sequence  Different structure • Free energy difference small due to large entropy decrease, DG = DH - TDS

  26. Structure Prediction • Why is structure prediction and especially ab initio calculations hard..? • Many degrees of freedom / residue • Remote noncovalent interactions • Nature does not go through all conformations • Folding assisted by enzymes & chaperones

  27. Molecular dynamics Ab initio calculations used for smaller problems ; • Calculation of affinity • Enzymatic pathways

  28. Sequence Classification rev. • Class : Secondary structure content • Fold : Major structural similarity. • Superfamily : Probable common evolutionary origin. • Family : Clear evolutionary relationship.

  29. Structure Prediction • Search sequence data banks for homologs • Search methods e.g. BLAST, PSIBLAST, FASTA… • Homologue in PDB..? IVTY…PGGG HYW…QHG

  30. Structure Prediction Multiple sequence / structure alignment • Contains more information than a single sequence for applications like homology modeling and secondary structure prediction • Gives location of conserved parts and residues likely to be buried in the protein core or exposed to solvent

  31. HFD fingerprint Multiple alignment example

  32. Secondary Structure Prediction • Statistical Analysis (old fashioned): • For each amino acid type assign it’s ‘propensity’ to be in a helix, sheet, or coil. • Limited accuracy ~55-60%. • Random prediction ~38%. MTLLALGINHKTAP... CCEEEEEECCCCCC...

  33. The Chou & Fasman Method • Each residue is classified as: • H/H, strong helix / strand former. • h/h, weak helix / strand former. • I, indifferent. • b/b, weak helix/strand breaker. • B/B, strong helix / strand breaker.

  34. The Chou & Fasman Method.. • Score each residue: • H/h=1, I=0 or ½, B/b=-1. • H/h=1, I=0 or ½, B/b=-1. • Helix nucleation: • Score > 4 in a “window” of 6 residues. • Strand nucleation: • Score > 3 in a “window” of 5 residues. • Propagate until score < 1 in a 4 residue “window”.

  35. The Chou & Fasman Method.. GPSRYIVTLANGK -1 -1 0 0 -1 1 1 0 1 1 -1 -1 1 Helix: No nucl. -2 0 1 2 3 3 1 -1 -1 -1 .5 1 1 1 1 1 0 0 -1 -1 Strand Nucleation -1.5 .5 2.5 4.5 54 3 1 -1 -2.5 -.5 1.5 … 3 1 -1 Propagate GPSRYIVTLANGK Result

  36. Modern methods • Neural networks (e.g. the PHD server): • Input: a number of protein sequences + secondary structure. • Output: a trained network that predicts secondary structure elements with ~70% accuracy. • Use many different methods and compare (e.g. the JPred server)!

  37. Summary • Thefunction of a protein is governed by its structure • Different sequence  Different structure • PDB, protein data bank • Secondary structure prediction is hard, tertiary structure prediction is even harder • Use homologs whenever possible or different methods to assess quality

More Related