1 / 38

Probabilistic Ensembles for Improved Protein Structure Determination

This paper presents a novel probabilistic ensemble method for improving protein structure determination using a combination of multiple models. The method involves running inference multiple times under different conditions to produce diverse estimates of each amino acid's location. Experimental results show significant improvements over standard approaches.

lisaf
Download Presentation

Probabilistic Ensembles for Improved Protein Structure Determination

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Ensembles for Improved Inference in Protein-Structure Determination Ameet Soni* and Jude Shavlik Dept. of Computer Sciences Dept. of Biostatistics and Medical Informatics Presented at the ACM International Conference on Bioinformatics and Computational Biology 2011

  2. Protein Structure Determination • Proteins essential to mostcellular function • Structural support • Catalysis/enzymatic activity • Cell signaling • Protein structures determine function • X-ray crystallography is main technique for determining structures

  3. Task Overview • Given • A protein sequence • Electron-density map (EDM) of protein • Do • Automatically produce a protein structure that • Contains all atoms • Is physically feasible SAVRVGLAIM...

  4. ARP/wARP TEXTAL & RESOLVE Our Method: ACMI 1 Å 2 Å 3 Å 4 Å Challenges & Related Work Resolution is a property of the protein Higher Resolution : Better Quality

  5. Outline • Protein Structures • Prior Work on ACMI • Probabilistic Ensembles in ACMI (PEA) • Experiments and Results

  6. Outline • Protein Structures • Prior Work on ACMI • Probabilistic Ensembles in ACMI (PEA) • Experiments and Results

  7. b b *1…M b k-1 k k+1 Our Technique: ACMI Perform Local Match Apply Global Constraints Sample Structure Phase 1 Phase 2 Phase 3 posterior probabilityof each AA’s location priorprobability of each AA’s location all-atom protein structures

  8. Results[DiMaio, Kondrashov, Bitto, Soni, Bingman, Phillips, and Shavlik, Bioinformatics 2007]

  9. b b *1…M b k-1 k k+1 ACMI Outline Perform Local Match Apply Global Constraints Sample Structure Phase 1 Phase 2 Phase 3 posterior probabilityof each AA’s location priorprobability of each AA’s location all-atom protein structures

  10. GLY2 ALA1 SER5 LEU4 LYS3 Phase 2 – Probabilistic Model • ACMI models the probability of all possible traces using a pairwise Markov Random Field (MRF)

  11. Probabilistic Model # nodes: ~1,000 # edges: ~1,000,000

  12. Approximate Inference • Best structure intractable to calculate i.e., we cannot infer the underlying structure analytically • Phase 2 uses Loopy Belief Propagation (BP) to approximate solution • Local, message-passing scheme • Distributes evidence between nodes

  13. LEU32 LYS31 Loopy Belief Propagation mLYS31→LEU32 pLEU32 pLYS31

  14. LEU32 LYS31 Loopy Belief Propagation mLEU32→LEU31 pLEU32 pLYS31

  15. Shortcomings of Phase 2 • Inference is very difficult • ~1,000,000 possible outputs for one amino acid • ~250-1250 amino acids in one protein • Evidence is noisy • O(N2) constraints • Approximate solutions, room for improvement

  16. Outline • Protein Structures • Prior Work on ACMI • Probabilistic Ensembles in ACMI (PEA) • Experiments and Results

  17. Ensemble Methods • Ensembles: the use of multiple models to improve predictive performance • Tend to outperform best single model [Dietterich ‘00] • Eg, Netflix prize

  18. Phase 2: Standard ACMI MRF Protocol P(bk)

  19. Phase 2: Ensemble ACMI MRF P1(bk) Protocol 1 Protocol 2 P2(bk) … … Protocol C PC(bk)

  20. Probabilistic Ensembles in ACMI (PEA) • New ensemble framework (PEA) • Run inference multiple times, under different conditions • Output: multiple, diverse, estimates of each amino acid’s location • Phase 2 now has several probability distributions for each amino acid, so what?

  21. b b *1…M b k k-1 k+1 ACMI Outline Perform Local Match Apply Global Constraints Sample Structure Phase 1 Phase 2 Phase 3 posterior probabilityof each AA’s location priorprobability of each AA’s location all-atom protein structures

  22. b b (1) Sample bkfrom empirical Ca- Ca- Capseudoangle distribution b' k-2 k-1 k Backbone Step (Prior work) Place next backbone atom ? ? ? ? ?

  23. b' k b b k-2 k-1 Backbone Step (Prior work) Place next backbone atom 0.25 0.20 … 0.15 (2) Weight each sample by its Phase 2 computed marginal

  24. b' k b b k-2 k-1 Backbone Step (Prior work) Place next backbone atom 0.25 0.20 … 0.15 (3) Select bkwith probability proportional to sample weight

  25. b b k-1 k-2 Backbone Step for PEA P1(b'k) P2(b'k) PC(b'k) 0.23 0.15 0.04 b' k ? Aggregator w(b'k)

  26. b b k-1 k-2 Backbone Step for PEA: Average P1(b'k) P2(b'k) PC(b'k) 0.23 0.15 0.04 b' k ? AVG 0.14

  27. b b k-1 k-2 Backbone Step for PEA: Maximum P1(b'k) P2(b'k) PC(b'k) 0.23 0.15 0.04 b' k ? MAX 0.23

  28. b b k-1 k-2 Backbone Step for PEA: Sample P1(b'k) P2(b'k) PC(b'k) 0.23 0.15 0.04 b' k ? SAMP 0.15

  29. b b k-2 k-1 Review: Previous work on ACMI 0.25 0.20 Protocol … 0.15 P(bk) Phase 2 Phase 3

  30. b b k-2 k-1 Review: PEA Protocol 0.14 0.26 Protocol … AGG 0.05 Protocol Phase 2 Phase 3

  31. Outline • Protein Structures • Prior Work on ACMI • Probabilistic Ensembles in ACMI (PEA) • Experiments and Results

  32. Experimental Methodology • PEA (Probabilistic Ensembles in ACMI) • 4 ensemble components • Aggregators: AVG, MAX, SAMP • ACMI • ORIG – standard ACMI (prior work) • EXT – run inference 4 times as long • BEST – test best of 4 PEA components

  33. Phase 2 Results *p-value < 0.01

  34. Protein Structure Results Completeness Correctness *p-value < 0.05

  35. Protein Structure Results

  36. Impact of Ensemble Size

  37. Conclusions • ACMI is the state-of-the-art method for determining protein structures in poor-resolution images • Probabilistic Ensembles in ACMI (PEA) improves approximate inference, produces better protein structures • Future Work • General solution for inference • Larger ensemble size

  38. Acknowledgements • Phillips Laboratory at UW - Madison • UW Center for Eukaryotic Structural Genomics (CESG) • NLM R01-LM008796 • NLM Training Grant T15-LM007359 • NIH Protein Structure Initiative Grant GM074901 Thank you!

More Related