1 / 17

A Hidden Markov Model for Protein Secondary Structure Prediction

A Hidden Markov Model for Protein Secondary Structure Prediction. Wei-Mou Zheng Institute of Theoretical Physics Academia Sinica PO Box 2735, Beijing 100080 zheng@itp.ac.cn. Outline. Protein structure A brief review of secondary structure prediction Hidden Markov model: simple-minded

griffin-le
Download Presentation

A Hidden Markov Model for Protein Secondary Structure Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Hidden Markov Model for Protein Secondary Structure Prediction Wei-Mou Zheng Institute of Theoretical Physics Academia Sinica PO Box 2735, Beijing 100080 zheng@itp.ac.cn

  2. Outline • Protein structure • A brief review of secondary structure prediction • Hidden Markov model: simple-minded • Hidden Markov model: realistic • Discussion • References

  3. Protein sequences are written in 20 letters (20 Naturally-occurring amino acid residues): AVCDE FGHIW KLMNY PQRST Hydrophobic Charged+- Polar

  4. Residues form a directed chain Cis- Trans-

  5. Rasmol ribbon diagram of GB1 Helix (pink), sheets (yellow) and coil (grey) Hydrogen-bond network 3D structure → secondary structure written in three letters:H, E, C. H: E: C = 34.9: 21.8: 43.3

  6. Bayes formula Count of Generally, P(x, y) = P(x|y)P(y),

  7. Protein sequence A, {ai}, i=1,2,…,n Secondary structure sequence S, {si}, i=1,2,…,n Secendary structure prediction: 1D amino acid sequences → 1D secondary structure sequence An old problem for more than 30 years Inference of S from A: P(S |A ) 1. Simple Chou-fasman approach Chou-Fasman’s propensity of amino acid to conformational state + independence approximation

  8. Parameter Training Propensities q(a,s) Counts (20x3) from a database: N(a, s) sum over a → N(s), sum over s → N(a), sum over a and s → N q(a,s) = [N(a,s) N] / [N(a) N(s)].

  9. 2. Garnier-Osguthorpe-Robson (GOR) window version Conditional Independency Weight matrix (20x17)x3 P(W|s) 3. Improved GOR (20x20x16x3, to include pair correlation)

  10. Hidden Markov Model (HMM): simple-minded Bayesian formula: P(S|A) = P(S,A)/P(A) ~ P(S,A) = P(A|S) P(S) Simple version emitting ai at si Markov chain according to P(a|s) For hidden sequence Forward and backward functions a1 a2 a3 s1 s2 s3

  11. Initial conditions and recursion relations Partition function Linear algorithm: Dynamic programming Baum-Welch (sum) & Viterbi (max)

  12. Prob(si=s, si+1=s’) = Ai(s) tss’ P(ai+1|s’) Bi+1(s’)/Z Prob(si:j)

  13. Hidden Markov Model: Realistic • 1) Strong correlation in conformational states: at least two consicutive E and three consicutive H • refined conformational states (243 → 75) • 2) Emission probabilities → improved window scores • Proportion of accurately predicted sites ~ 70% (compared with < 65% for prediction based on a single sequence) • No post-prediction filtering • Integrated (overall) estimation of refined conformation states • Measure of prediction confidence

  14. Discussions • HMM using refined conformational states and window scores is efficient for protein secondary structure prediction. • Better score system should cover more correlation between conformation and sequence. • Combining homologous information will improve the prediction accuracy. • From secondary structure to 3D structure (structure codes: discretized 3D conformational states)

  15. References Lawrence R Rabiner, A tutorial on hidden Markov models and selected appllications in speech recognition Proceeding of the IEEE, 77 (1989) 257-286 Burkhard Rost Protein Secondary Structure Prediction Continues to Rise Journal of Structural Biology 134, 204–218 (2001)

  16. The End

  17. Small P Tiny G I A V Aliphatic L C S N T D Q M E Y K F H R Negative W Positive Aromatic Hydrophobic Polar

More Related