Similarity Methods - PowerPoint PPT Presentation

Similarity methods l.jpg
1 / 28

  • Updated On :
  • Presentation posted in: General

Similarity Methods. C371 Fall 2004. Limitations of Substructure Searching/3D Pharmacophore Searching. Need to know what you are looking for Compound is either there or not Don’t get a feel for the relative ranking of the compounds Output size can be a problem. Similarity Searching.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Similarity Methods

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Similarity methods l.jpg

Similarity Methods


Fall 2004

Limitations of substructure searching 3d pharmacophore searching l.jpg

Limitations of Substructure Searching/3D Pharmacophore Searching

  • Need to know what you are looking for

  • Compound is either there or not

    • Don’t get a feel for the relative ranking of the compounds

  • Output size can be a problem

Similarity searching l.jpg

Similarity Searching

  • Look for compounds that are most similar to the query compound

  • Each compound in the database is ranked

  • In other application areas, the technique is known as pattern matching or signature analysis

Similar property principle l.jpg

Similar Property Principle

  • Structurally similar molecules usually have similar properties, e.g., biological activity

  • Known also as “neighborhood behavior”

  • Examples: morphine, codeine, heroin

  • Define: in silico

    • Using computational techniques as a substitute for or complement to experimental methods

Advantages of similarity searching l.jpg

Advantages of Similarity Searching

  • One known active compound becomes the search key

  • User sets the limits on output

  • Possible to re-cycle the top answers to find other possibilities

  • Subjective determination of the degree of similarity

Applications of similarity searching l.jpg

Applications of Similarity Searching

  • Evaluation of the uniqueness of proposed or newly synthesized compounds

  • Finding starting materials or intermediates in synthesis design

  • Handling of chemical reactions and mixtures

  • Finding the right chemicals for one’s needs, even if not sure what is needed.

Subjective nature of similarity searching l.jpg

Subjective Nature of Similarity Searching

  • No hard and fast rules

  • Numerical descriptors are used to compare molecules

  • A similarity coefficient is defined to quantify the degree of similarity

  • Similarity and dissimilarity rankings can be different in principle

Similarity and dissimilarity l.jpg

Similarity and Dissimilarity

“Consider two objects A and B, a is the number of features (characteristics) present in A and absent in B, b is the number of features absent in A and present in B, c is the number of features common to both objects, and d is the number of features absent from both objects. Thus, c and d measure the present and the absent matches, respectively, i.e., similarity; while a and b measure the corresponding mismatches, i.e., dissimilarity.” (Chemoinformatics; A Textbook (2003), p. 304)

2d similarity measures l.jpg

2D Similarity Measures

  • Commonly based on “fingerprints,” binary vectors with 1 indicating the presence of the fragment and 0 the absence

  • Could relate structural keys, hashed fingerprints, or continuous data (e.g., topological indexes that take into acount size, degree of branching, and overall shape)

Tanimoto coefficient l.jpg

Tanimoto Coefficient

  • Tanimoto Coefficient of similarity for Molecules A and B:

    SAB = c _

    a + b – c

    a = bits set to 1 in A, b = bits set to 1 in B, c = number of 1 bits common to both

    Range is 0 to 1.

    Value of 1 does not mean the molecules are identical.

Similarity coefficients l.jpg

Similarity Coefficients

  • Tanimoto coefficient is most widely used for binary fingerprints

  • Others:

    • Dice coefficient

    • Cosine similarity

    • Euclidean distance

    • Hamming distance

    • Soergel distance

Distance between pairs of molecules l.jpg

Distance Between Pairs of Molecules

  • Used to define dissimilarity of molecules

  • Regards a common absence of a feature as evidence of similarity

When is a distance coefficient a metric l.jpg

When is a distance coefficient a metric?

  • Distance values must be zero or positive

    • Distance from an object to itself must be zero

  • Distance values must be symmetric

  • Distance values must obey the triangle inequality: DAB ≤ DAC + DBC

  • Distance between non-identical objects must be greater than zero.

  • Dissimilarity = distance in the n-dimensional descriptor space

Size dependency of the measures l.jpg

Size Dependency of the Measures

  • Small molecules often have lower similarity values using Tanimoto

  • Tanimoto normalizes the degree of size in the denominator:

    SAB = c _

    a + b – c

Other 2d descriptor methods l.jpg

Other 2D Descriptor Methods

  • Similarity can be based on continuous whole molecule properties, e.g. logP, molar refractivity, topological indexes.

  • Usual approach is to use a distance coefficient, such as Euclidean distance.

Maximum common subgraph similarity l.jpg

Maximum Common Subgraph Similarity

  • Another approach: generate alignment between the molecules (mapping)

  • Define MCS: largest set of atoms and bonds in common between the two structures.

  • A Non-Polynomial- (NP)-complete problem: very computer intensive; in the worst case, the algorithm will have an exponential computational complexity

  • Tricks are used to cut down on the computer usage

Maximum common subgraph l.jpg

Maximum Common Subgraph

Reduced graph similarity l.jpg

Reduced Graph Similarity

  • A structure’s key features are condensed while retaining the connections between them

  • Cen ID structures with similar binding characteristics, but different underlying skeletons

  • Smaller number of nodes speeds up searching

3d similarity l.jpg

3D Similarity

  • Aim is often to identify structurally different molecules

  • 3D methods require consideration of the conformational properties of molecules

Tanimoto coefficient to find compounds similar to morphine l.jpg

Tanimoto Coefficient to Find Compounds Similar to Morphine

3d alignment independent methods l.jpg

3D: Alignment-Independent Methods

  • Descriptors: geometric atom pairs and their distances, valence and torsion angles, atom triplets

  • Consideration of conformational flexibility increases greatly the compute time

  • Relatively fewer pharmacophoric fingerprints than 2D fingerprints

    • Result: Low similarity values using Tanimoto

Pharmacophore l.jpg


  • A structural abstraction of the interactions between various functional group types in a compound

  • Described by a spatial representation of these groups as centers (or vertices) of geometrical polyhedra, together with pairwise distances between centers


3d alignment methods l.jpg

3D: Alignment Methods

  • Require consideration of the degrees of freedom related to the conformational flexibility of the molecules

  • Goal: determine the alignment where similarity measure is at a maximum

3d field based alignment methods l.jpg

3D: Field-Based Alignment Methods

  • Consideration of the electron density of the molecules

    • Requires quantum mechanical calculation: costly

    • Property not sufficiently discriminatory

3d gnomonic projection methods l.jpg

3D: Gnomonic Projection Methods

  • Molecule positioned at the center of a sphere and properties projected on the surface

  • Sphere approximated by a tessellated icosahedron or dodecahedron

  • Each triangular face is divided into a series of smaller triangles

Finding the optimal alignment l.jpg

Finding the Optimal Alignment

  • Need a mechanism for exploring the orientational (and conformational) degrees of freedon for determining the optimal alignment where the similarity is maximized

  • Methods: simplex algorithm, Monte Carlo methods, genetic alrogithms

Evaluation of similarity methods l.jpg

Evaluation of Similarity Methods

  • Generally, 2D methods are more effective that 3D

    • 2D methods may be artificially enhanced because of database characteristics (close analogs)

    • Incomplete handling of conformational flexibility in 3D databases

  • Best to use data fusion techniques, combining methods

For additional information l.jpg

For additional information . . .

  • See Dr. John Barnard’s lecture at:

  • Login