Probabilistic methods for interpreting electron density maps
This presentation is the property of its rightful owner.
Sponsored Links
1 / 76

Probabilistic Methods for Interpreting Electron-Density Maps PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

Probabilistic Methods for Interpreting Electron-Density Maps. Frank DiMaio University of Wisconsin – Madison Computer Sciences Department [email protected] 3D Protein Structure. backbone. backbone sidechain. backbone sidechain C -a l p h a. ALA. LEU. PRO. VAL. ARG. ?. ?. ?.

Download Presentation

Probabilistic Methods for Interpreting Electron-Density Maps

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Probabilistic methods for interpreting electron density maps

Probabilistic Methods for Interpreting Electron-Density Maps

Frank DiMaio

University of Wisconsin – Madison

Computer Sciences Department

[email protected]


3d protein structure

3D Protein Structure

backbone

backbone

sidechain

backbone

sidechain

C-alpha


3d protein structure1

ALA

LEU

PRO

VAL

ARG

?

?

?

3D Protein Structure


High throughput structure determination

High-Throughput Structure Determination

  • Protein-structure determination important

    • Understanding function of a protein

    • Understanding mechanisms

    • Targets for drug design

  • Some proteins produce poor density maps

  • Interpreting poor electron-density maps is very (human) laborious

  • I aim to automatically interpret poor-quality electron-density maps


Electron density map interpretation

Electron-Density Map Interpretation

GIVEN: 3D electron-density map,(linear) amino-acid sequence


Electron density map interpretation1

Electron-Density Map Interpretation

FIND:All-atom Protein Model


Density map resolution

Density Map Resolution

1.0Å

2.0Å

3.0Å

4.0Å

Ioerger et al. (2002)

Terwilliger (2003)

Morris et al. (2003)

My focus


Thesis contributions

Thesis Contributions

  • A probabilistic approach to protein-backbone tracingDiMaio et al., Intelligent Systems for Molecular Biology (2006)

  • Improved template matching in electron-density mapsDiMaio et al., IEEE Conference on Bioinformatics and Biomedicine (2007)

  • Creating all-atom protein models using particle filteringDiMaio et al. (under review)

  • Pictorial structures for atom-level molecular modelingDiMaio et al., Advances in Neural Information Processing Systems (2004)

  • Improving the efficiency of belief propagationDiMaio and Shavlik, IEEE International Conference on Data Mining (2006)

  • Iterative phase improvement in ACMI


A cmi overview

ACMI Overview

  • Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)

    • Independent amino-acid search

    • Templates model 5-mer conformational space

  • Phase 2: Coarse backbone model(ISMB 2006, ICDM 2006)

    • Protein structural constraints refine local search

    • Markov field (MRF) models pairwise constraints

  • Phase 3: Sample all-atom models

    • Particle filtering samples high-prob. structures

    • Probs. from MRF guide particle trajectories


A cmi overview1

ACMI Overview

  • Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)

    • Independent amino-acid search

    • Templates model 5-mer conformational space

  • Phase 2: Coarse backbone model(ISMB 2006, ICDM 2006)

    • Protein structural constraints refine local search

    • Markov field (MRF) models pairwise constraints

  • Phase 3: Sample all-atom models

    • Particle filtering samples high-prob. structures

    • Probs. from MRF guide particle trajectories


5 mer lookup

5-mer Lookup

…SAWCVKFEKPADKNGKTE…

  • ACMI searches map for each template independently

  • Spherical-harmonic decomposition allows rapid search of all template rotations

Protein

DB


Spherical harmonic decomposition

Spherical-Harmonic Decomposition

f (θ,φ)


5 mer fast rotation search

map-regionsampled in

spherical shells

sampled region of

density in 5A sphere

template-densitysampled in

spherical shells

calculated (expected)

density in 5A sphere

5-mer Fast Rotation Search

electron density map

pentapeptide fragment

from PDB (the “template”)


5 mer fast rotation search1

map-region spherical-harmonic coefficients

map-regionsampled in

spherical shells

correlationcoefficientas functionof rotation

template-densitysampled in

spherical shells

template spherical-harmonic coefficients

5-mer Fast Rotation Search

fast-rotation

function(Navaza 2006,

Risbo 1996)


Convert scores to probabilities

correlation coefficients

over density mapti (ui)

probability distribution over density map

P(5-mer at ui|EDM)

Convert Scores to Probabilities

Bayes’

rule

scan density map

for fragment


A cmi overview2

ACMI Overview

  • Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)

    • Independent amino-acid search

    • Templates model 5-mer conformational space

  • Phase 2: Coarse backbone model(ISMB 2006, ICDM 2006)

    • Protein structural constraints refine local search

    • Markov field (MRF) models pairwise constraints

  • Phase 3: Sample all-atom models

    • Particle filtering samples high-prob. structures

    • Probs. from MRF guide particle trajectories


Probabilistic backbone model

Probabilistic Backbone Model

  • Trace assigns a position and orientation ui={xi, qi} to each amino acid i

  • The probability of a trace U={ui} is

  • This full joint probability intractable to compute

  • Approximate using pairwise Markov field


Pairwise markov field model

ALA

GLY

LYS

LEU

SER

Pairwise Markov-Field Model

  • Joint probabilities defined on a graph as product of vertex and edge potentials


Acmi s backbone model

ACMI’s Backbone Model

ALA

GLY

LYS

LEU

SER

Observational potentialstie the map to the model


Acmi s backbone model1

ALA

GLY

LYS

LEU

SER

ACMI’s Backbone Model

  • Adjacency constraints ensure adjacent amino acids are ~3.8Å apart and in proper orientation

  • Occupancy constraints ensure nonadjacent amino acids do not occupy same 3D space


Backbone model potential

Backbone Model Potential


Backbone model potential1

Backbone Model Potential

Constraints between adjacent amino acids

×

=


Backbone model potential2

Backbone Model Potential

Constraints between all other amino acid pairs


Backbone model potential3

Backbone Model Potential

Observational (“template-matching”) probabilities


Inferring backbone locations

Inferring Backbone Locations

  • Want to find backbone layout that maximizes


Inferring backbone locations1

Inferring Backbone Locations

  • Want to find backbone layout that maximizes

  • Exact methods are intractable

  • Use belief propagation (Pearl 1988) to approximate marginal distributions


Belief propagation example

Belief Propagation Example

LYS31

LEU32

mLYS31→LEU32

ˆ

ˆ

pLYS31

pLEU32


Belief propagation example1

Belief Propagation Example

LYS31

LEU32

mLEU32→LYS31

ˆ

ˆ

pLYS31

pLEU32


Scaling bp to proteins dimaio and shavlik icdm 2006

Scaling BP to Proteins(DiMaio and Shavlik, ICDM 2006)

  • Naïve implementation O(N2G2)

    • N = the number of amino acids in the protein

    • G = # of points in discretized density map

  • O(G2) computation for each message passed

    • O(G log G) as Fourier-space multiplication

  • O(N2) messages computed & stored

    • Approx (N-3) occupancy msgs with 1 message

    • O(N) messages using a message accumulator

  • Improved implementation O(NG log G)


Scaling bp to proteins dimaio and shavlik icdm 20061

Scaling BP to Proteins(DiMaio and Shavlik, ICDM 2006)

  • Naïve implementation O(N2G2)

    • N = the number of amino acids in the protein

    • G = # of points in discretized density map

  • O(G2) computation for each message passed

    • O(G log G) as Fourier-space multiplication

  • O(N2) messages computed & stored

    • Approx (N-3) occupancy msgs with 1 message

    • O(N) messages using a message accumulator

  • Improved implementation O(NG log G)


Occupancy message approximation

Occupancy Message Approximation

  • To pass a message

occupancy

edge potential

product of incoming msgs to iexcept from j


Occupancy message approximation1

Occupancy Message Approximation

  • To pass a message

  • “Weak” potentials between nonadjacent amino acids lets us approximate

occupancy

edge potential

product of all

incoming msgs to i


Occupancy message approximation2

Occupancy Message Approximation

1

2

4

5

3

6


Occupancy message approximation3

Occupancy Message Approximation

1

2

4

5

3

6


Occupancy message approximation4

Occupancy Message Approximation

ACC

1

2

4

5

3

6

Send outgoing occupancy message product to a central accumulator


Occupancy message approximation5

Occupancy Message Approximation

ACC

1

2

4

5

3

6

Then, each node’s incoming message product is computed in constant time


Bp output

BP Output

  • After some number of iterations, BP gives probability distributions over Cα locations

ARG

LEU

PRO

ALA

VAL


A cmi s backbone trace

ACMI’s Backbone Trace

  • Independently choose Cα locations that maximize approximate marginal distribution


Example 1xri

Example: 1XRI

3.3Å resolution density map

39° mean phase error

prob(AA at location)

HIGH

0.9

0.1

LOW

0.9009Å RMSd

93% complete


Testset density maps raw data

Testset Density Maps (raw data)

75

60

Density-map mean phase error (deg.)

45

30

15

1.0

2.0

3.0

4.0

Density-map resolution (Å)


Experimental accuracy

% backbone correctly placed

% amino acids correctly identified

Experimental Accuracy

100

80

60

% Cα’s located within 2Å of some Cα / correct Cα

40

20

0

ACMI

ARP/wARP

Resolve

Textal


Experimental accuracy on a per protein basis

100

100

80

80

60

60

40

40

20

20

0

0

0

20

40

60

80

100

0

20

40

60

80

100

Experimental Accuracy on a Per-Protein Basis

100

80

60

ACMI % Cα’s located

40

20

0

0

20

40

60

80

100

ARP/wARP % Cα’s located

Resolve % Cα’s located

Textal % Cα’s located


A cmi overview3

ACMI Overview

  • Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)

    • Independent amino-acid search

    • Templates model 5-mer conformational space

  • Phase 2: Coarse backbone model(ISMB 2006, ICDM 2006)

    • Protein structural constraints refine local search

    • Markov field (MRF) models pairwise constraints

  • Phase 3: Sample all-atom models

    • Particle filtering samples high-prob. structures

    • Probs. from MRF guide particle trajectories


Problems with a cmi

Probability=0.4

Probability=0.35

Probability=0.25

Maximum-marginal structure

Problems with ACMI

  • Biologists want location of all atoms

  • All Cα’s lie on a discrete grid

  • Maximum-marginal backbone model may be physically unrealistic

  • Ignoring a lot of information

  • Multiple models may better represent conformational variation within crystal


A cmi with particle filtering a cmi pf

ACMI with Particle Filtering(ACMI-PF)

Idea: Represent protein using a set of static 3D all-atom protein models


Particle filtering overview doucet et al 2000

Particle Filtering Overview (Doucet et al. 2000)

  • Given some Markov process x1:KXwith observations y1:K Y

  • Particle Filtering approximates some posterior probability distribution over Xusing a set of N weighted point estimates


Particle filtering overview

Particle Filtering Overview

  • Markov process gives recursive formulation

  • Use importance fn. q(x k |x 0:k-1 ,y k) to grow particles

  • Recursive weight update,


Particle filtering for protein structures

Particle Filtering for Protein Structures

  • Particle refers to one specific 3D layout of some subsequence of the protein

  • At each iteration advance particle’s trajectory by placing an additional amino-acid’s atoms


Particle filtering for protein structures1

Particle Filtering for Protein Structures

  • Alternate extending chain left and right


Particle filtering for protein structures2

Particle Filtering for Protein Structures

  • Alternate extending chain left and right

  • An iteration alternately places

    • Cα positionbk+1 given bk

    • All sidechain atomssk given bk-1:k+1

bk-1

bk

bk+1

sk


Particle filtering for protein structures3

Particle Filtering for Protein Structures

  • Key idea: Use the conditional distribution p(bk|bik-1,Map) to advance particle trajectories

  • Construct this conditional distribution from BP’s marginal distributions

bk-1

bk

bk+1

sk


Particle filtering for protein structures4

sk

bk+1

bk-1

Particle Filtering for Protein Structures

Algorithm

place “seeds” bkifor each particle i=1…N

whileamino-acids remain

place bki+1 / bji-1 given bj:kifor each i=1…N

place ski given bki-1:k+1for each i=1…N

optionally resample N particles

end while

bk


Backbone step for particle i

1…L

b

k+1

b

b

k

k-1

Backbone Step (for particle i)

place bki+1 given bkifor each i=1…N

(1) Sample Lbk+1’s from bk-1–bk–bk+1 pseudoangle distribution


Backbone step for particle i1

1…L

b

k+1

2

L

1

pk+1(b )

pk+1(b )

pk+1(b )

k+1

k+1

k+1

b

b

k

k-1

Backbone Step (for particle i)

place bki+1 given bkifor each i=1…N

(2) Weight each sample by its ACMI-computed approximate marginal


Backbone step for particle i2

1…L

b

k+1

2

L

1

pk+1(b )

pk+1(b )

pk+1(b )

k+1

k+1

k+1

b

b

k

k-1

Backbone Step (for particle i)

place bki+1 given bkifor each i=1…N

(3) Select bk+1 with probability proportional to sample weight


Backbone step for particle i3

b

k+1

b

b

k-1

k

Backbone Step (for particle i)

place bki+1 given bkifor each i=1…N

(4) Update particle weight as sum of sample weights


Sidechain step for particle i

Sidechain Step (for particle i)

place ski given bki-1:k+1for each i=1…N

Protein

Data

Bank

(1) Sample sk from a database of sidechain conformations


Sidechain step for particle i1

1

2

3

pk(EDM |s )

pk(EDM |s )

pk(EDM | s )

k

k

k

Sidechain Step (for particle i)

place ski given bki-1:k+1for each i=1…N

(2) For each sidechain conformation, compute probability of densitymap given the sidechain


Sidechain step for particle i2

1

3

2

pk(EDM |s )

pk(EDM | s )

pk(EDM |s )

k

k

k

Sidechain Step (for particle i)

place ski given bki-1:k+1for each i=1…N

(3) Select sidechain conformation from this weighted distribution


Sidechain step for particle i3

Sidechain Step (for particle i)

place ski given bki-1:k+1for each i=1…N

(4) Update particle weight as sum of sample weights


Particle resampling

Particle Resampling

wt = 0.4

wt = 0.4

wt = 0.2

wt = 0.3

wt = 0.3

wt = 0.2

wt = 0.1

wt = 0.1

wt = 0.2

wt = 0.1

wt = 0.1

wt = 0.2

wt = 0.1

wt = 0.1

wt = 0.2


Amino acid sampling order

Amino-Acid Sampling Order

  • Begin at some amino acid k with probability

j

k

  • At each step, move left to right with probability


Experimental methodology

Experimental Methodology

  • Run ACMI-PF 10 times with 100 particles each

  • Return highest-weight particle from each run

  • Each run samples amino-acids in a different order

  • Refine each structure for 10 iterations in Refmac5

  • Compare 10-structure model to others using Rfree


A cmi pf versus a cmi na ve

ACMI-PF Versus ACMI-Naïve

Additionally, ACMI-PF’s models have …

  • Fewer gaps (10 vs. 28)

  • Lower sidechain RMS error (2.1Å vs. 2.3Å)

Refined Rfree

Number of ACMI-PF runs


A cmi pf versus others

0.65

0.65

0.65

0.55

0.55

0.55

0.45

0.45

0.45

0.35

0.35

0.35

0.25

0.25

0.25

0.25

0.35

0.45

0.55

0.65

0.25

0.35

0.45

0.55

0.65

0.25

0.35

0.45

0.55

0.65

ACMI-PF Versus Others

ACMI-PF Rfree

ARP/wARP Rfree

Resolve Rfree

Textal Rfree


A cmi pf example 2a3q

ACMI-PF Example: 2A3Q

2.3Å resolution

66° phase err.

1.79Å RMSd

92% complete


A cmi overview4

ACMI Overview

  • Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)

    • Independent amino-acid search

    • Templates model 5-mer conformational space

  • Phase 2: Coarse backbone model(ISMB 2006, ICDM 2006)

    • Protein structural constraints refine local search

    • Markov field (MRF) models pairwise constraints

  • Phase 3: Sample all-atom models

    • Particle filtering samples high-prob. structures

    • Probs. from MRF guide particle trajectories

  • Phase 4: Iterative phase improvement

    • Use particle-filtering models to improve density-map quality

    • Rerun entire pipeline on improved density map

    • Repeat until convergence


Phase problem

Phase Problem

Intensities

Measured by X-raycrystallography

Phases

Experimentally

estimated (e.g. MAD, MIR)


Density map phasing

Density-Map Phasing

30°

60°

75°

mean phase error


Iterative phase improvement

Iterative Phase Improvement

Initialdensity map

Reviseddensity map

Predicted3D model


A cmi pf s phase improvement

75

60

45

30

15

0

0

15

30

45

60

75

ACMI-PF’s Phase Improvement

Error in ACMI-PF’s phases(deg. mean phase error)

Error in initial phases(deg. mean phase error)


Two iteration a cmi

Two-Iteration ACMI

100

90

% backbone locatedIteration 2

80

70

60

50

50

60

70

80

90

100

% backbone locatedIteration 1


Future work many iteration a cmi

60

50

20

40

30

15

20

10

10

0

5

0

1

2

3

4

0

1

2

3

4

5

?

?

Future Work: Many-iteration ACMI

Average % uninterpreted AAs

Average mean phase error

Number of ACMI iterations

Number of ACMI iterations


Conclusions

Conclusions

  • ACMI’s three steps construct a set of all-atom protein models from a density map

  • Novel message approximation allows inference on large, highly-connected models

  • Resulting protein models are more accuratethan other methods


Ongoing and future work

Ongoing and Future Work

  • Incorporate additional structural biology background knowledge

  • Incorporate more complex potential functions

  • Further work on iterative phase improvement

  • Generalize my algorithms to other 3D image data


Acknowledgements

Acknowledgements

  • Advisor Jude Shavlik

  • Committee

    • George Phillips

    • Charles Dyer

    • David Page

    • Mark Craven

  • Collaborators

    • Ameet Soni

    • Dmitry Kondrashov

    • Eduard Bitto

    • Craig Bingman

  • 6th floor MSCers

  • Center for Eukaryotic Structural Genomics

  • Funding

    • UW-Madison Graduate School

    • NLM 1T15 LM007359

    • NLM 1R01 LM008796


  • Login