1 / 81

Proteiinianalyysi 7

Proteiinianalyysi 7. Kolmiulotteisen rakenteen ennustaminen http://www.bioinfo.biocenter.helsinki.fi/downloads/teaching/spring2006/proteiinianalyysi. Sekvenssist ä rakenteeseen. komparatiivinen mallitus 1-ulotteinen tilan (luokan) ennustaminen sekvenssistä

levana
Download Presentation

Proteiinianalyysi 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proteiinianalyysi 7 Kolmiulotteisen rakenteen ennustaminen http://www.bioinfo.biocenter.helsinki.fi/downloads/teaching/spring2006/proteiinianalyysi

  2. Sekvenssistä rakenteeseen • komparatiivinen mallitus • 1-ulotteinen tilan (luokan) ennustaminen sekvenssistä • 3-ulotteisen rakenteen tunnistaminen annetusta kirjastosta (fold recognition) • 3-ulotteisen rakenteen ennustaminen ab initio

  3. Motivation • Protein structure determines protein function • For the majority of proteins the structure is not known

  4. Curve fitted to data • for homologous • families • Divergence of • common cores • fraction in core • decreases with • increasing sequence • divergence Chothia & Lesk (1986)

  5. Steps in comparative modelling • Find suitable template(s) • Build alignment between target and template(s) • Build model(s) • Replace sidechains • Resolve conflicts in the structure • Model loops (regions without an alignment) • Evaluate and select model(s)

  6. State of the art in homology modelling • Template search • (iterative) sequence database searches (PSIBLAST) • Alignment step • multiple alignment of close to fairly distant homologues • Modelling step • rigid body assembly • segment matching • satisfaction of spatial constraints

  7. An alignment defines structurally equivalent positions! Template structure Template sequence Alignment Target sequence Model

  8. The crucial importance of the alignment Template sequence Template structure Alignment Target sequence Model

  9. Modelling by spatial restraints • Generate many constraints: • Homology derived constraints • Distances and angles between aligned positions should be similar • Stereochemical constraints • Bond lengths, bond angles, dihedral angles, nonbonded atom-atom contacts • Model derived by minimizing restraints Modeller: Sali & Blundell (1993)

  10. Loop modelling • Exposed loop regions usually more variable than protein core • Often very important for protein function • Loops longer than 5 residues difficult to built • Mini-protein folding problem

  11. Model evaluation • Check of stereochemistry • bond lengths & angles, peptide bond planarity, side-chain ring planarity, chirality, torsion angles, clashes • Check of spatial features • hydrophobic core, solvent accessibility, distribution of charged groups, atom-atom-distances, atomic volumes, main-chain hydrogen bonding • 3D profiles/mean force potentials • residue environment

  12. Knowledge-based mean force potentials • Compute typical atomic/residue environments based on known protein structures Melo & Feytmanns (1997)

  13. Modelling a transcription factor • Sequence from different species • Is binding to ligand conserved?

  14. Ligand binding domain hydrogen bonds to ligand homo-serine lactone moiety binding acyl moiety binding

  15. DNA binding domain DNA binding domain Linker

  16. New Loop Template Target Variable loops MODELLER output

  17. Ligand binding pocket

  18. Errors in comparative modelling • Side chain packing • Distortions and shifts • Loops • Misalignments • Incorrect template True structure Template Model Marti-Renom et al. (2000)

  19. Modelling accuracy Marti-Renom et al. (2000)

  20. Applications of homology modelling Marti-Renom et al. (2000)

  21. Structural genomics • Post-genomics: • many new sequences, no function • Aim: a structure for every protein • High-throughput structure determination • robotics • standard protocols for cloning/expression/crystallization

  22. Structural coverage high quality models Complete models Total = 43 % Vitkup et al. (2001)

  23. Target selection

  24. Fold recognition - Assumption • Native structure is the global minimum energy conformation • So, need • Discriminating energy function • Conformation generator • Backbone from homologous template (comparative modelling) • Backbone from analogous template (fold recognition) • Comprehensive sampling (ab initio)

  25. Fold recognition steps • Template library • Known structures from Protein Data Bank • Fold classification suggests a limited number of fold types • Score = sequence-structure fitness • Environmental preferences of amino acids • Boltzmann engine • Search problem = alignment • Complicated with pair potentials • Significance of best score in database search • Reference state

  26. Potentials of mean force • “Boltzmann engine” • In thermodynamic equilibrium, particles are partitioned between states proportionally to exp(-DG) • Effective energy = negative logarithm of the equilibrium constant • Count occurrences per state • Radial distribution of aa pairs (Sippl)

  27. Structural environment • Single-residue preferences 20 x 3 x 3 x 3 • Helix, strand, coil • Accessibility • Contact area (indirectly codes for aa type) • Contact pair potentials • Atomic contacts within 4 A • C-beta atoms within 7 A • Secondary structure of residues i and j • 3 x 20 x 3 x 20 = 3600 preferences

  28. Information content Arg-Asp helix-helix (dashed) Arg-Asp strand-strand (solid) Arg-Asp (dotted)

  29. Threading algorithms • Dynamic programming • Simple • “frozen approximation” • Read sequence-dependent environment from template (1st round), then from aligned target sequence • Stochastic optimization (Monte Carlo) • Pair potentials • Exhaustive search • Simplify search space (e.g., ignore loops)

  30. Prospect model (Xu & Xu) Etotal = vmutateEmutate x vsingleEsingle x vpairEpair x vgapEgap Weights v optimized on training set

  31. Prospect - segmentation • - Finds optimal threading fairly efficiently • Topological complexity • No gaps in secondary structure elements • Pair energy term only evaluated between • secondary structure elements

  32. Prospect- observations • Mutation energy is the most important • Single-residue terms with profile information generate reasonably good alignments for ~2/3 of test cases • The pairwise energy term can thus be ignored during the search for optimal alignment, but is used in evaluating the fold recognition

  33. Performance comparison Method Family only Superfamily Fold only Top 1 Top 5 Top 1 Top 5 Top 1 Top 5 Using pair potential PROSPECT 84.1 88.2 52.6 64.8 27.7 50.3 Using dynamic programming, structural environment FUGUE 82.2 85.8 41.9 53.2 12.5 26.8 THREADER 49.2 58.9 10.8 24.7 14.6 37.7 Using sequence similarity only PSI-BLAST 71.2 72.3 27.4 27.9 4.0 4.7 HMMER 67.7 73.5 20.7 31.3 4.4 14.6 SAMT98 70.1 75.4 28.3 38.9 3.4 18.7 BLASTLINK 74.6 78.9 29.3 40.6 6.9 16.5 SSEARCH 68.6 75.7 20.7 32.5 5.6 15.6

  34. Threading score - significance • Target sequence – fold library • Each threading aligns a different sub-sequence • Compute Z-score for each by ungapped threading on large decoy (Sippl) • “Reverse threading” • Design optimal sequence for a given fold

  35. Incorrect self-threading

  36. Fold recognized

  37. Fold recognized Poor alignment of residues

  38. Ab initio prediction • HMMSTR/I-sites/RosettaHMMSTR is a Hidden Markov Model based on protein STRucture. Each Markov state in this model represents a position in one of the I-sites motifs. HMMSTR can predict local structure (as backbone angles), secondary structure, and supersecondary structure (edge versus middle strand, hairpin versus diverging turn). • I-sites LibraryI-sites is a library of folding initiation site motif, which are sequence motifs that correlate with particular local structures such as beta hairpins and helix caps. I-sites can be used to predict local structure, or to predict which parts of a protein are likely to fold early, initiating folding.

  39. Intermediates are not observed, but Folding is 2-state Unfolded Folded

  40. Nucleation sites something happens first...

  41. Early folding events might be recorded in the database Short, recurrent sequence patterns could be folding Initiation sites recurrent part HDFPIEGGDSPMQTIFFWSNANAKLSHGY CPYDNIWMQTIFFNQSAAVYSVLHLIFLT IDMNPQGSIEMQTIFFGYAESAELSPVVNFLEEMQTIFFISGFTQTANSD INWGSMQTIFFEEWQLMNVMDKIPSIFNESKKKGIAMQTIFFILSGR PPPMQTIFFVIVNYNESKHALWCSVD PWMWNLMQTIFFISQQVIEIPSMQTIFFVFSHDEQMKLKGLKGA Non-homologous proteins Nature has selected for these patterns because they speed folding.

  42. How to read an I-sites motif profile

More Related