Predicting ligand binding sites on protein surface

Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

What is the binding site? • (Concave, cleft, hole) – shaped region on protein surface • A key into a lock! • Key-ligand • Lock-protein • Lock hole-binding sites

Why do we need to find binding sites? • First step in many structure analyses: • Functional/catalytic site prediction • Comparisons of protein atomic configurations • Docking calculations • Structure-based drug design • …

Algorithms for finding binding sites • Grid-based • Cover the protein into a 3D grid, • Empty grid points are then defined a pockets if they satisfy a number of geometric or energetic conditions. • Sphere-based • A set of probe spheres are placed on protein surface. • Pocket spheres are those generated probe spheres that satisfy a number of geometric conditions among the generated probe spheres. • α-shape based • Is defined as a subset of Delaunay tessellations of protein atoms, omitting edges longer than the sum of the radii of two atoms.

Algorithms for finding binding sites • Grid-based • POCKET, LIGSITE, LIGSITECS ,LIGSITECSC ,ConCavity, PocketPicker and GHECOM • Sphere-based • SURFNET, PASS, Q-SiteFinder, PHECOM • α-shape based • CAST, Fpocket

α-shape The shape surrounded by the black line The edge of Delaunay tessellations

Delaunay tessellations No edge that its length is longer than the sum of the radii of two atoms

α-shape based: CAST • Computes a triangulation of the protein’s surface atoms using α-shapes, then triangles are grouped by letting small triangles flow toward neighboring larger triangles, which act as sinks!

Grid-based • The protein is projected onto a 3D grid. They focused on PSP (protein-solvent-protein) events of the grids. • When a straight line drawn from a grid point is enclosed on both side by protein atoms, the arrangement of the line for that grid point is termed a PSP event. • Grid points having more than a threshold number of PSP events are defined as pockets.

Sphere-based • SURFNET: • Places a sphere (called gap spheres) between two protein atoms. • If the sphere contains any other atoms, reduce its radius until it just touches one protein atom. • A set of these gap spheres are defined as pockets.

Grid-based: GHECOM • By Takeshi Kawabata • Kawabata T. (2010) Detection of multi-scale pockets on protein surfaces using mathematical morphology. Proteins,78, 1195-1121 To define pocket region on protein surface

Primary points: • A new definition of pockets by using the basic operations of mathematical morphology • Proposed an algorithm for finding pockets • Construct a useful dataset for algorithm testing • Introduced a new method for evaluate binding site predictions • Some useful discoveries about ligands bind to binding sites

Some Background: • Multiscale pockets: • Calculate deep and shallow pockets simultaneously • “Multiscale pockets” need “multiscale probes”, they use many probes of different sizes to define pockets. • “Size” and “Depth” of pockets: • Two properties of pockets • A definition of pockets using small and large spherical probes of his previous work: PHECOM • A pocket region: a space into which a small spherical can enter but a large spherical probe cannot.

Pocket definition • Mathematical Morphology • It is a theory used in the analysis of geometric features of digital images based on rigorous set theory. • Morphology can provide boundaries of objects, their skeletons, and their convex hulls. It is also useful for many pre- and post-processing techniques, especially in edge thinning and pruning.

mathematical morphology (con.) • Four operations: dilation, erosion, opening, closing a: Molecular shape b: The shape of the probe c:X⊕P: Operation dilation of X by P d:XΘP: Operation erosion of X by P e:X○P: Operation opening of X by P f: X • P: Operation closing of X by P The shape X is the vdW volume of a protein

mathematical morphology (con.) • mathematical morphology language: • The translation of the shape X by the vector p (p-translated X) is denoted by (X)p and is defined by:

mathematical morphology (con.) • where Xc is the complement of shape X • Xc = E3 –X • In other words, the closing of X by P is defined as a space where the probe P cannot enter when any overlaps between X and P are prohibited. • The closing of X by P is called as the “molecular volume” of molecule X defined by probe P.

Pocket definition (con.) • Eq.(12) is introduced by Masuya and Doi using mathematical morphological operations:

Pocket definition (con.)

Algorithm: Multiscale closing or multiscale molecular volume: Using K types of large probe spheres P1,P2, … Pk, and one Small probe S, must satisfy: The opening condition means that a large probe Pj can be reconstru- cted by a set of translated smaller probes Pi.

Algorithm (con.) • If the opening condition [Eq. 16] is satisfied for all the probes {Pi}, then the following relation will hold: But …

Algorithm (con.) Not satisfy Eq.(16)

Algorithm (con.) • Is the assumption WRONG ? • NO! • The assumption of Eq. (16) is still safe, because they use digitized pseudo-spheres as approximations of real spheres in continuous space, and therefore, the digitized pseudo-spheres should have the properties of real spheres.

Algorithm (con.) • Only one index for the 3D grid I(x) is necessary to store K types of dilations, molecular volumes and pockets: Multiscale dilation Multiscale closing or Multiscale molecular volume Multiscale pocket • x is a 3D point, ID(x), IC(x) and IP(x) are integers determined by a 3D point x.

Algorithm (con.) • Rinaccess: • The minimum inaccessible radius, means the minimum radius of spheres that cannot touch the point x. • As a measure of shallowness for probes on protein surface. • Rpocket • The minimum pocket radius, means the minimum radius of spheres with which the point x is within the pocket.

Algorithm (con.) • Eq.(17-19) suggest an efficient algorithm for calculating multiscale dilations, molecular volumes and pockets. • To implement an efficient algorithm, a shell of pockets Hk is defined as the difference of kth and (k-1)th probes as follows:

Algorithm (con.) • A general strategy for an efficient algorithm is to process a shape X using a series of shells, progressing in size from smaller to large shell( H1, H2, …, Hk). • The algorithm is shown in Figure 4. • In this study, the grid width was set to 0.8 Å, the radius of the probe S was set to 1.87 Å, and 17 types of different large probes Pk were used, their radius were: 2.0, 2.5, 3.0, 3.5,…. And 10 Å.

Algorithm (con.) • Calculation of Rinaccess for ligand atoms A measure of pocket shallowness for probes or atoms of binding ligands is useful for characterizing binding pockets. |L| is the number of points in the sharp L of the ligand. A: 1/((1/3 + 1/4 + 1/4 )/3) = 3.6 Å B: 1/((1/6 + 1/5 + 1/5 )/3) = 5.3 Å

Algorithm (con.) • Calculation of Rinaccess and pocketness for protein atoms and residues A measure for characterizing the depth of a protein atom or residue is useful for analyzing the relationship between ligand types and surrounding protein atom types. For characterizing the depth of protein atoms, they introduced the concept of “accessible shell volume” around a part of protein Y: where shell Y is a part of a protein shape X (Y⊂X), and S is a spherical probe.

Algorithm (con.) • The measure of pocketness for a protein atom or residue, indicating how much it contributes to binding ligands. • Generally speaking, deep and large pockets tend to bind ligands. • Here is a measure pocketness to indicate both size and depth of a pocket: A residue in a deeper and larger pocket has a larger value of pocketness.

Algorithm (con.) • Clustering grids and filtering out small clusters • Most of ligands are bound in the largest pockets. • The procedure of clustering pockets and extracting only large pocket clusters have been widely used by researchers. • In this study, using multiscale boundaries of pockets need a threshold value of the Rpocket measure for the boundary between the pocket and the open outer space. [will shown in “Results” section]

Dataset • Prepared from SCOP database, V 1.73 • Included protein chains with mutual sequence identities of 40% or less. • Exclude: • Small proteins with less than 40 residues • Protein chains with domains of class f,h,i,j,k, total 7375 chains • Extract the chains bound to “proper” small molecules, exclude: • Tiny molecules • Unnatural precipitants: BOG, DTT, EPE, GOL, MES, MPD, MRD, PG4 and TRS. • DNA, RNA ( >= 3 ntd) and proteins (>=10 aa) • Chains with more than 10,000 heavy atoms • As a result: • 1817 chains were included. • Each of which contacted at least one proper small molecule. • Only use bound chains.

Evaluation of binding site predictions using recall-precision plots • For purpose of comparison, calculated pockets and binding ligands were represented by pockets or ligands with 0.8Å width; each point was checked to determine if it was inside of the pockets or binding ligands. NP is the number of grid points in pockets, NL is the number of grid point overlapping with ligands, and NPL is the number of grid points in pockets that overlapped with ligands.

Results 1dwd

Results

Useful discoveries • The majority of molecules binding in deep pockets were coenzymes • In contrast, adenine and guanine mononucleotides tend to bind in medium-to-shallow pockets • Macromolecules tend to bind in shallow pockets or protruded regions

Useful discoveries • In the typical binding pose of the dataset HEM molecule, the aromatic atoms CBB and CMC are facing proteins, whereas the carboxyl atoms O1A and O2A are facing water. • In the ADP molecule, the atom N6 in the adenine ring and the atom O1B, O2B and O3B of phosphate group favored deep pockets, the atoms of sugar, such as O2’ and O3’, favored shallow pockets. N6 side of adenine atoms and the phosphate termini are facing proteins, while the sugar atoms are facing water.

Summary: • A new definition of pockets by using the basic operations of mathematical morphology • Proposed an efficient algorithm for finding pockets • Construct a useful dataset for algorithm testing • Introduced a new method for evaluate binding site predictions with precision and recall. • Some useful discoveries

Thanks!Any questions? Please feel free to ask me!

Predicting ligand binding sites on protein surface

Predicting ligand binding sites on protein surface

Presentation Transcript

Protein Binding Phenomena

PROTEIN BINDING

Ligand configurational entropy and protein binding

Protein-Ligand Docking

Binding Energy Distribution Analysis Method (BEDAM) for estimating protein-ligand affinities

Predicting binding free energies on a large scale

Identification of protein-protein binding motifs

Improving Protein-Ligand Binding Affinity Prediction using Random Forest

LFY Binding sites on chromosome 1

Protein-ligand geometry

Similarity Measures for Protein Binding Sites based on Fuzzy Histogram Comparison

Drug-Protein Binding

Finding Ligand Binding Sites on a Proteome-wide Scale and its Implications

Protein Function –Binding

Creb Binding Protein

Predicting functional surface patches on protein structural models

Maltodextrin Binding Protein

Modeling Dependencies in Protein-DNA Binding Sites

Chapter 5.1: Protein Function - Reversible Binding of Protein to a Ligand

Ligand-binding site prediction based on 3D protein modeling

Protein-protein and Protein-ligand Docking

Q- SiteFinder : an energy-based method for the prediction of protein- ligand binding sites