1 / 89

P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee

P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee CS 882 Protein Folding Instructed by Professor Ming Li. 0 . OUTLINE. Introduction Problem Methods (4) HMM Examples (3) Segmentation HMM Profile HMM Conditional Random Field Proposal.

inoke
Download Presentation

P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee CS 882 Protein Folding Instructed by Professor Ming Li

  2. 0. OUTLINE • Introduction • Problem • Methods (4) • HMM Examples (3) • Segmentation HMM • Profile HMM • Conditional Random Field • Proposal

  3. 1. INTRODUCTION • Introduction * • Problem • Methods (4) • HMM Examples (3) • Segmentation HMM • Profile HMM • Conditional Random Field • Proposal

  4. 1. Genomics • Achievements in Genomic • BLAST (Basic Local Alignment Search Tool) • most cited paper published in 1990s • more than 15,000 times • Human genome project • Completion April 2003

  5. 1. Proteomics • Precedence to Proteomics • Protein Data Bank (PDB) • 40,132 structures • cited more than 6,000 times

  6. 1. Proteomics Number of Protein Structures in Protein Data Bank

  7. 1. Secondary Structure • Importance • The known secondary structure may be used as an input for the tertiary structure predictions.

  8. 1. Protein Structure • Primary Structure

  9. 1. Protein Structure • Secondary Structure

  10. 1. Secondary Structure • α-helix • Interaction between i and (i+4)th residue

  11. 1. Secondary Structure • β-sheet/strand • Parallel or Anti-parallel

  12. 1. Secondary Structure • Coil (loop)

  13. 1. Protein Structure • Tertiary Structure

  14. 1. Protein Structure • Super-Secondary (2.5) Structure Super-Secondary (2.5) Structure

  15. 1. Protein Structure • Quaternary Structure Super-Secondary (2.5) Structure

  16. 2. PROBLEM • Introduction • Problem * • Methods (4) • HMM Examples (3) • Segmentation HMM • Profile HMM • Conditional Random Field • Proposal

  17. 2. Secondary Structure • Problem • Given: • A primary sequence of amino acids • a1a2…an • Find: • Secondary structure of each ai as • α-helix = H • β-strand = E * • coil = C

  18. 2. Secondary Structure • Example • Given: • Primary Sequence • GHWIATRGQLIREAYEDYRHFSSECPFIP • Find: • Secondary Structure Element • CEEEEECHHHHHHHHHHHCCCHHCCCCCC • Note: segments

  19. 2. Prediction Quality • Three-state prediction accuracy • Q3 = # of correctly predicted residues total # of number of residues • Q, Qβ, Qc • Q3 for random prediction is 33% • Theoretical limit Q3=90%.

  20. 2. Prediction Quality • Segment Overlap (SOV) • Higher penalties for core segment regions • Matthews Correlation Coefficients (MCC) • Prediction errors made for each state

  21. 2. True Structures • Three dimensional PDB data • DSSP (Dictionary of Secondary Structure of Proteins) • 8 states • H = alpha helix H • G = 310 - helix H • I = 5 helix (pi helix) H • E = extended strand (beta ladder) E • B = residue in isolated beta-bridge E • T = hydrogen bonded turn C • S = bend C • C = coil C • STRIDE

  22. 3. METHODS • Introduction • Problem • Methods (4) * • HMM Examples (3) • Segmentation HMM • Profile HMM • Conditional Random Field • Proposal

  23. 3. Sliding Window • Sliding-Window

  24. 3. Sliding Window • Sliding-Window

  25. 3. Sliding Window • Sliding-Window

  26. 3. Sliding Window • Sliding-Window

  27. 3. Four Methods • Statistical Method • Neural Network • Support Vector Machine • Hidden Markov Model

  28. 3a. Statistical Method • Propensity • Ex. Chou-Fasman 50~53%

  29. 3b. Neural Network • Ex. PHD 71%

  30. 3c. SVM • Ex. PSIPRED 76~78%

  31. 3d. HMM Definition • State set Q • Output alphabet Σ

  32. 3d. HMM Definition • Transition probabilities • probability of entering the state p from state q • Tq(p) •  q  Q •  p  Q

  33. 3d. HMM Definition • Emission probabilities • probability emits each letter of Σ from state q • Eq(ai) •  ai  Σ •  q  Q

  34. 3d. HMM Decoding • Problem • Given: • HMM = (Q,Σ,E,T) and • Sequence S • Where S = S1, S2, …, Sn • Find: • Most probable path of state gone through to get S • Where X = X1, X2, …, Xn = state sequence

  35. 4. HMM Decoding • Optimize • Pr [ S , X ] • X = X1, X2, …, Xn = state sequence • S = S1, S2, …, Sn • Pr [ S | X ]

  36. 4. HMM Decoding • Dynamic programming • Memoryless • Pr [Xn|Sn] = Pr [Xn-1|Sn-1] Tn-1[Xn] EXn[Sn]

  37. 4. HMM EXAMPLES • Introduction • Problem • Methods (4) • HMM Examples (3) * • Segmentation HMM • Profile HMM • Conditional Random Field • Proposal

  38. 4a. SEMI-HMM • Introduction • Problem • Methods (4) • HMM Examples (3) • Semi-HMM * • Profile HMM • Conditional Random Field • Proposal

  39. 4a. Semi-HMM • Definition • Each state can emit a sequence • Move emission probabilities into states • Model secondary structure segments

  40. 4a. Segmentation • Sequence Segments

  41. 4a. Segmentation • Sequence Segments

  42. 4a. Segmentation • Sequence Segments • T = secondary structural type of the segment, {H, E, L} • S = ends of each individual structural segments • R = known amino acid sequence

  43. 4a. Segmentation • Sequence Segments • T2 = E = β-strand • S2 = 9 • R2 = S1 + 1 : S2

  44. 4a. Bayesian • Bayesian Formulation • R = Sequence of ALL amino acid residues • S = End of the segments • T = Secondary structural type of the segments • {H, E, L}

  45. 4a. Bayesian • Bayesian Formulation    • Likelihood • Priori Probability • Constant  (S,T)   dropped

  46. 4a. Bayesian • Likelihood • m = Total number of segments • Sj = End of the jth segments • Tj = Secondary structural type of the jth segments

  47. 4a. Bayesian • Likelihood

  48. 4a. Bayesian • Likelihood

  49. 4a. Bayesian •  Likelihood N-terminus Internal C-terminus

  50. 4a. BSPPS • Bayesian Segmentation PPS

More Related