slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Applications of Voronoi tessellations in protein structure prediction and analysis Brendan McConkey Department of Biolo PowerPoint Presentation
Download Presentation
Applications of Voronoi tessellations in protein structure prediction and analysis Brendan McConkey Department of Biolo

Loading in 2 Seconds...

play fullscreen
1 / 47

Applications of Voronoi tessellations in protein structure prediction and analysis Brendan McConkey Department of Biolo - PowerPoint PPT Presentation


  • 164 Views
  • Uploaded on

Applications of Voronoi tessellations in protein structure prediction and analysis Brendan McConkey Department of Biology University of Waterloo. Q. Du, V. Faber, M Gunzberger (1999) Centroidal Voronoi Tessellations: Applications and Algorithms SIAM review 41(4):637-676.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Applications of Voronoi tessellations in protein structure prediction and analysis Brendan McConkey Department of Biolo' - torrance


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Applications of Voronoi tessellations in

protein structure prediction and analysis

Brendan McConkey

Department of Biology

University of Waterloo

slide2

Q. Du, V. Faber, M Gunzberger (1999) Centroidal Voronoi Tessellations: Applications and Algorithms

SIAM review 41(4):637-676

slide4

Wigner-Seitz Cells

Soap Bubbles in Frame. Fig. 52 from Soap Bubbles,

Their Colors and Forces which Mold Them. C.V. Boys.

The distribution of McDonald's Restaurants in San Francisco.

http://www.snibbe.com/scott

http://www.chembio.uoguelph.ca/educmat/chm729/wscells/start.htm

slide5

Part of a dragonfly's wing. Fig. 162.

From On Growth and Form . D'Arcy Thompson.

"Reticulum Plasmatique." Fig. 321.

From On Growth and Form . D'Arcy Thompson.

http://www.snibbe.com/scott

slide6

Frogs' eggs showing various partitionings of first eight cells. Fig. 257.

From On Growth and Form . D'Arcy Thompson.

http://www.snibbe.com/scott

slide7

Applications in protein structure analysis:

    • scoring functions for protein folding (statistical assessment of contacts within a protein)
    • generation of protein Voronoi contact maps (2D targets for structure prediction)
    • calculating surfaces, areas, and volumes of atoms and amino acid residues
slide8

Structure prediction methods

  • The structure prediction problem may be divided into two related tasks:
      • A search procedure - comparative modeling - ab initio prediction
      • An energetic or scoring function - physicochemical potentials - statistical potentials
slide9

What determines the structure of a protein?

    • * energetics – structure should have a minimum energy
    • amino acid sequence
    • topology of the protein
    • environment (solvation, membrane interactions)
    • constitutive ligands (ions, heme groups, …)
    • interactions with other proteins, cofactors, ligands
    • - folding of cytosolic proteins is largely driven by desolvation
  • That an amino acid sequence can spontaneously form a functional protein implies that the structure is robust to small changes (structure is in a low energy conformation, and will return to this conformation if perturbed)
slide10

Protein folding energy landscape

  • protein energy landscape is complex, with many local minima
  • believed to have a funnel-like shape, with global minimum representing native structure

image from http://bioinfo.mshri.on.ca/

slide11

Scoring functions

  • Energetic functions
  • Etotal = Ebonds + Eangles + Edihedrals + Evan der Waals + Eelectrostatics + Esolvation + …
  • Knowledge-based functions (e.g. statistical pairwise distance potentials)
  • Residue-residue contact potentials
  • Each method type often uses training sets - protein structures solved by experimental methods - to estimate parameters.
slide12

Development of an atom-atom contact scoring function

  • Advantages of contact-based scoring:
  • can treat the solvent accessible surface as an atomic contact, eliminating the need to add corrective terms
  • solvation energy is proportional to the solvent contact area (Eisenberg, 1986)
  • hydrophobic interactions are largely due to desolvation, so are correlated to loss of solvent contact area
  • knowledge-based statistical methodology may be applied to contact areas as well as inter-atomic distances* contact scores require a reliable quantification of inter-atomic contacts. This can be done using a Voronoi tessellation.
slide13

Defining atom contacts: Voronoi tessellations

Original method: given a set of points in a plane, the plane is divided into polygonal regions with one region per point (Voronoi, 1908).This may be applied to protein structures in three dimensions, and can quantify atom volumes and packing efficiencies for internal atoms (Richards et al, 1974; Tsai et al, 1999)

slide14

A constrained Voronoi procedure

  • Applied to atom-atom and atom-solvent contacts within proteins,
  • the solvent accessible surface needs to be calculated (shown in blue)
  • atom-atom contacts should be limited to within ~2 atom diameters
  • contact areas should not be dependent on the size of polyhedra

A rapid and exact analytical procedure for calculating volumes, contacts, and solvent accessibility has been developed using this method, termed a constrained Voronoi algorithm.

slide15

Integration of Voronoi tessellations with Solvent Accessible Surface

plane of separation

bisecting plane

pij = dij / 2

pij

radical plane

pij = [dij2 + ri2- rj2] / 2

extended radical plane

pij = [dij2 + (ri+rw)2- (rj+rw)2] / 2

slide16

Calculation of atom-atom contacts

  • to remove the dependency on polyhedra size, the angular contact area is used.
  • contact area is quantified by projecting the polyhedron faces to the surface of a sphere
  • calculated as a sum of spherical triangles and arc segments.
  • provides an exact and continuous estimate of atom-atom contacts
  • permits solvent contacts to be treated as atom contacts
  • approximates loss of solvent accessible surface on folding

CA5

CA1

CA4

CA2

CA3

slide17

Sample atom-atom contact frequencies

153l

N---O

N---Cb

1ads

0.2

1mcp

0.1

0.0

Contact frequency

Cb---Cb

N---N

0.2

0.1

0.0

0

5

10

15

20

0

5

10

15

20

Contact area, Å2

slide18

Calculation of scoring function

  • uses 167 residue specific atom types plus the solvent accessible surface for a total of 168 contact types
  • scores generated from a non-redundant database of 648 proteins
  • A contact potential eis calculated for each of the 167 x 168 possible contacts:
  • Corrected for atom distributions within proteins
  • The score of a protein structure is determined by calculating all non-bonded contacts within the structure and multiplying each by the contact potential: Score = ei(j) Areai(j)
slide19

A few words on the reference state...

  • reference states (expected distributions) have a large influence on statistical scoring functions
  • here, the unfolded protein (maximum possible solvent contact) is used as a reference state
  • results in a closed system, with fixed amount of solvent
  • statistics independent of size of protein
  • consistent with the idea that protein folding is largely due to hydrophobic interactions
slide20

1

2

3

4

5

6

7

8

9

10

11

12

13

14

1

2

3

4

5

6

7

8

9

10

11

12

13

14

0

-3

-2

-1

1

2

3

Results - contact potential array

1. backbone Ca

backbone C

backbone N

backbone O

2. val Cg2phe Cd1phe Cd2tyr Cd1tyr Cd2val Cg1ile Cg2leu Cd1ile Cd1leu Cd2phe Cz

phe Ce2phe Ce1

trp Ch2

trp Cz2tyr Ce1tyr Ce2trp Cd1met Cemet Sdtrp Cz3trp Ce3

3. val Cbleu Cbphe Cbtyr Cbtrp Cbmet Cbmet Cg

ile Cg1

ile Cb

leu Cgcys Cbcys Cg

4. tyr Ohhis Ce1his Ne2arg Cdarg Nelys Cggln Cgasn Cbser Cbasp Cbglu Cbgln Cbthr Cbarg Cgarg Cblys Cbpro Cdhis Cd2thr Cg2pro Cbpro Cgglu Cgasn Cgtrp Ne1his Nd1asp Cg

5. tyr Cgphe Cg

trp Cgtrp Cd2

6. tyr Cztrp Ce2

7. his Cb

ala Cb

8. his Cg

9. glu Cd

gln Cd

arg Cz

10.thr Og1

ser Og

asn Nd2

gln Ne2

arg Nh1

arg Nh2

lys Cd

lys Ce

11. lys Nz

12.glu Oe2

glu Oe1

asp Od2

asp Od1

13.gln Oe1

asn Od1

14.Solvent

slide21

Decoy sets: source:

EMBL, CASP1 http://prostar.carb.nist.gov (J. Moult, U. of Maryland)

4state, lattice_ssfit, lmds http://dd.stanford.edu (M. Levitt, Stanford U.)

Rosetta http://depts.washington.edu.bakerpg (D. Baker, U. of Washington)

CASP4 http://predictioncenter.llnl.gov/CASP4 (Lawrence Livermore National Laboratory)

Testing of scoring functions

To provide independent tests of protein folding potentials, several groups have created decoy sets, misfolded models of proteins of known structure. An effective scoring function should be able to distinguish native structures from the decoys, and ideally select near-native structures as well. (Decoy sets with corresponding X-ray structures and less than 10% difference in number of atoms were used.)

slide22

4

2

0

-2

-4

-6

-8

-10

-12

Testing of scoring functions

Contact scores for 1ctf decoy set (4state decoys)

Score/atom

0

2

4

6

8

10

Ca rmsd (Angstroms)

slide23

4

2

0

-2

-4

-6

-8

-10

-12

Testing of scoring functions

Contact scores for 1ctf decoy set (4state decoys)

Score/atom

0

2

4

6

8

10

Ca rmsd (Angstroms)

slide24

1acfrank 1/1000

1aa2rank 1/1000

1orcrank 1/1000

1msirank 29/1000

1palrank 1/1000

1r69rank 1/1000

1whorank 1/1000

4fgfrank 1/1000

5ptirank 1/1000

5icbrank 9/1000

Histograms of native (red) and decoy (blue) scores for the Rosetta decoy monomers

slide25

1csprank 1/1000

1ctfrank 1/1000

1ailrank 1/1000

1bdorank 1/1000

1pdorank 1/1000

1kterank 1/1000

1ervrank 1/1000

1gvprank 1/1000

1utgrank 1/1000

1vlsrank 1/1000

2acyrank 1/1000

1risrank 1/1000

2fharank 1/1000

Histograms of native (red) and decoy (blue) scores for the Rosetta decoy oligomers

slide26

HL Hinds and Levitt, 1992BT Betancourt and Thirumalai, 1999GKS Godzik, Kolinski, Skolnick, 1995MJ Miyazawa and Jernigan, 1996

TE Tobi and Elber, 2000BJ Bahar and Jernigan, 1997MSE McConkey, Sobolev, Edelman 2003

Comparisons with existing scoring functions

  • comparisons were made as Z-scores and percent of Rank 1 native structures
  • 4-state, lattice_ssfit, and lmds decoy sets (Samudrala and Levitt, 1999)
  • 23 proteins, 250-2000 decoys per protein

Snative - (SSi decoy/n)sdecoy

Z-score =

Average Z-score

% Rank 1 native structures

slide27

-20

0

T0111 (1e9i)

T0117 (1j90)

-20

-40

-40

-60

-60

-80

-80

-100

-100

-120

-120

0

5

10

15

20

25

Score (-% native)

0

5

10

15

20

25

30

20

40

0

T0125 (1gak)

T0123 (1exs)

-20

0

-40

-40

-60

-80

-80

-100

-120

-120

0

5

10

15

20

25

0

5

10

15

20

25

C-alpha RMSD (A2)

Sample decoy sets from CASP4

slide28

Summary of decoy set testing

Performance of atom-atom contact scoring function on decoy sets. Z-score is the distance from the native structures to the mean of the decoy set measured in standard deviations.

average # rank 1 average rank 1 average decoy decoys per solutions, Z-score solutions, Z-score sets target sub-units sub-units 4°(native) 4°(native)

EMBL 1 25/25 n/a 25/25 n/aCASP1 7 5/6 2.38 6/6 3.724state 665 7/7 3.86 7/7 4.08lattice_ssfit 2000 8/8 8.17 8/8 9.21lmds 453 6/8 4.96 8/8 7.80CASP4 53 21/25 2.60 24/25* 3.01 Rosetta 1042 19/23 3.64 21/23* 4.38

Total 101/112 109/112

* missed structures: CASP4 -1exs; Rosetta- 1msi, 5icb.

slide29

Summary of atom-atom contact scoring

  • the Voronoi tessellation permits a precise and continuous quantification of atom-atom contacts
  • the contact scoring function qualitatively resembles energetic interaction potentials
  • the scoring function has a very high success rate for recognition of correctly folded protein structures, and has greater accuracy than other currently available scoring functions
  • Native protein structures could be identified in 97% of the decoy sets tested
slide30

Observations from all-atom potential

  • backbone atoms behave similarly, independent of residue type
  • statistical potential less accurate for backbone atoms due to severe topology constraints (e.g. C--N interaction)
  • backbone N and O are almost always H-bonded or solvent exposed
  • there is an reasonably strong effect of neighboring atoms on the potential (e.g. Lysine NZ and Lysine CE)
slide31

But...

  • contact potential is still an all-atom potential
  • requires all atoms to be positioned for a structure to be scored
  • does not readily permit simplification of folding algorithms
  • a simplified potential would be useful in initial stages of protein folding.

the same methodology for creating the all atom potential has been used to create a folding potential.

slide32

First attempt at simplification:

  • reduce number of contact types from 168 to less than 30
  • use residue types to define united atom types
  • assume backbone atoms behave similarly
  • GLY is treated as part of backbone
  • implicitly includes interactions with solvent
  • initial function remains area dependent
slide33

One possibility is a residue-residue potential:

A beads-on-a-string model of amino acid chain

Unfortunately, this approach has hadonly moderate success in the past.

slide34

A variation of beads on a string:

A united-atom model of the amino acid chain

  • backbone interactions are ignored (assumed to be hydrogen bonded)
  • approximates a residue-residue contact potential
slide35

A United Atom potential

  • uses contrained Voronoi procedure as before, with a reduced number of atom types and excluding backbone interactions
  • counting contacts between side-chains (i.e. excluding backbone atoms) may better model certain interactions within proteins
  • e.g. interactions in beta-sheets:
slide36

United Atom potential #1

  • the UA potential was compared to the all-atom potential using the Rosetta decoy set
  • Area dependence was used: Score = ei(j) Areai(j)

21/23

19/23

19/23

17/23

slide37

United Atom potentials

  • the initial United Atom function is still dependent on calculating contact areas between amino acid residues, so relies on knowledge of the position of side chains
  • a binary potential (residues in contact or not) would be more useful, as it doesn’t require coordinate information
  • A binary contact potential was developed, where side- chains were considered in contact if they shared > 8 Å2 contact area.
  • Solvent contact was also enumerated, with 10-30 Å2 = 1 contact, 30-50 Å2 = 2 contacts, etc.
  • binary potential tested using Rosetta decoys
slide38

Binary United Atom potential

Sample data: 1msi from the Rosetta decoy set

All-atom potential

binary-UA potential

1msi

1msi

Score

Score

C-alpha RMSD

slide39

22/23

21/23

20/23

19/23

19/23

17/23

Binary United Atom potential

  • The binary UA potential recognized the native structure in all test sets except one (5pti, rank 2/1000)
slide40

(Smith et al, 1997)

Applications to Protein Contact maps

12 A Ca contact map for 2csn, casein kinase-1

  • contact maps specify both secondary structure and inter-residue contacts
  • a detailed contact map provides sufficient information to reconstruct a 3-D structure
  • generation of a large set of feasible contact maps can reproduce near native structures
slide41

Protein structure prediction: Contact maps

  • Some issues with distance based contact maps:
  • typically use C-C distances - dependent on appropriate choice of cutoff
  • short cutoff distance biases map towards contacts within secondary structures
  • longer cutoff distance results in more contacts, and a noisy data set
  • C atoms in close proximity may have little interaction - e.g. n, n+2 residues in an alpha helix
  • contact with solvent not readily integrated
  • A tessellation procedure based on residue sidechains can circumvent some of these issues
slide42

Voronoi Contact maps

  • similar to C-C distance-based maps
  • uses a tesselation procedure to determine if residues are in contact
  • contacts can be subdivided by type: - sidechain contacts - backbone contacts - both sidechain and backbone
  • results in recognizable patterns of interaction for within and between secondary structures
  • it is possible to integrate solvent contact into this scheme as well
slide44

Voronoi Contact maps

C distance map

Voronoi map

slide45

Voronoi Contact map feature recognition

Using the contact preferences from residue-residue scores, it is possible to recognize regions of secondary structure, and interactions between secondary structure elements:

alpha helix:

alpha-alpha:

antiparallel beta-beta

beta-alpha:

parallel beta-beta

slide46

Future work

  • Further refinement of binary contact scoring functions
    • incorporate different contact types
    • beta sheet vs. alpha helix
  • Development of search procedures to explore contact map space
  • Other unrelated stuff
    • proteomics
    • gene expression and divergence
    • physicochemical pattern recognition
slide47

Thanks!

....questions?