Similarity methods
1 / 28

Similarity Methods - PowerPoint PPT Presentation

  • Updated On :

Similarity Methods. C371 Fall 2004. Limitations of Substructure Searching/3D Pharmacophore Searching. Need to know what you are looking for Compound is either there or not Don’t get a feel for the relative ranking of the compounds Output size can be a problem. Similarity Searching.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Similarity Methods' - issac

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Similarity methods l.jpg

Similarity Methods


Fall 2004

Limitations of substructure searching 3d pharmacophore searching l.jpg
Limitations of Substructure Searching/3D Pharmacophore Searching

  • Need to know what you are looking for

  • Compound is either there or not

    • Don’t get a feel for the relative ranking of the compounds

  • Output size can be a problem

Similarity searching l.jpg
Similarity Searching Searching

  • Look for compounds that are most similar to the query compound

  • Each compound in the database is ranked

  • In other application areas, the technique is known as pattern matching or signature analysis

Similar property principle l.jpg
Similar Property Principle Searching

  • Structurally similar molecules usually have similar properties, e.g., biological activity

  • Known also as “neighborhood behavior”

  • Examples: morphine, codeine, heroin

  • Define: in silico

    • Using computational techniques as a substitute for or complement to experimental methods

Advantages of similarity searching l.jpg
Advantages of Similarity Searching Searching

  • One known active compound becomes the search key

  • User sets the limits on output

  • Possible to re-cycle the top answers to find other possibilities

  • Subjective determination of the degree of similarity

Applications of similarity searching l.jpg
Applications of Similarity Searching Searching

  • Evaluation of the uniqueness of proposed or newly synthesized compounds

  • Finding starting materials or intermediates in synthesis design

  • Handling of chemical reactions and mixtures

  • Finding the right chemicals for one’s needs, even if not sure what is needed.

Subjective nature of similarity searching l.jpg
Subjective Nature of Similarity Searching Searching

  • No hard and fast rules

  • Numerical descriptors are used to compare molecules

  • A similarity coefficient is defined to quantify the degree of similarity

  • Similarity and dissimilarity rankings can be different in principle

Similarity and dissimilarity l.jpg
Similarity and Dissimilarity Searching

“Consider two objects A and B, a is the number of features (characteristics) present in A and absent in B, b is the number of features absent in A and present in B, c is the number of features common to both objects, and d is the number of features absent from both objects. Thus, c and d measure the present and the absent matches, respectively, i.e., similarity; while a and b measure the corresponding mismatches, i.e., dissimilarity.” (Chemoinformatics; A Textbook (2003), p. 304)

2d similarity measures l.jpg
2D Similarity Measures Searching

  • Commonly based on “fingerprints,” binary vectors with 1 indicating the presence of the fragment and 0 the absence

  • Could relate structural keys, hashed fingerprints, or continuous data (e.g., topological indexes that take into acount size, degree of branching, and overall shape)

Tanimoto coefficient l.jpg
Tanimoto Coefficient Searching

  • Tanimoto Coefficient of similarity for Molecules A and B:

    SAB = c _

    a + b – c

    a = bits set to 1 in A, b = bits set to 1 in B, c = number of 1 bits common to both

    Range is 0 to 1.

    Value of 1 does not mean the molecules are identical.

Similarity coefficients l.jpg
Similarity Coefficients Searching

  • Tanimoto coefficient is most widely used for binary fingerprints

  • Others:

    • Dice coefficient

    • Cosine similarity

    • Euclidean distance

    • Hamming distance

    • Soergel distance

Distance between pairs of molecules l.jpg
Distance Between Pairs of Molecules Searching

  • Used to define dissimilarity of molecules

  • Regards a common absence of a feature as evidence of similarity

When is a distance coefficient a metric l.jpg
When is a distance coefficient a metric? Searching

  • Distance values must be zero or positive

    • Distance from an object to itself must be zero

  • Distance values must be symmetric

  • Distance values must obey the triangle inequality: DAB ≤ DAC + DBC

  • Distance between non-identical objects must be greater than zero.

  • Dissimilarity = distance in the n-dimensional descriptor space

Size dependency of the measures l.jpg
Size Dependency of the Measures Searching

  • Small molecules often have lower similarity values using Tanimoto

  • Tanimoto normalizes the degree of size in the denominator:

    SAB = c _

    a + b – c

Other 2d descriptor methods l.jpg
Other 2D Descriptor Methods Searching

  • Similarity can be based on continuous whole molecule properties, e.g. logP, molar refractivity, topological indexes.

  • Usual approach is to use a distance coefficient, such as Euclidean distance.

Maximum common subgraph similarity l.jpg
Maximum Common Subgraph Similarity Searching

  • Another approach: generate alignment between the molecules (mapping)

  • Define MCS: largest set of atoms and bonds in common between the two structures.

  • A Non-Polynomial- (NP)-complete problem: very computer intensive; in the worst case, the algorithm will have an exponential computational complexity

  • Tricks are used to cut down on the computer usage

Reduced graph similarity l.jpg
Reduced Graph Similarity Searching

  • A structure’s key features are condensed while retaining the connections between them

  • Cen ID structures with similar binding characteristics, but different underlying skeletons

  • Smaller number of nodes speeds up searching

3d similarity l.jpg
3D Similarity Searching

  • Aim is often to identify structurally different molecules

  • 3D methods require consideration of the conformational properties of molecules

3d alignment independent methods l.jpg
3D: Alignment-Independent Methods Searching

  • Descriptors: geometric atom pairs and their distances, valence and torsion angles, atom triplets

  • Consideration of conformational flexibility increases greatly the compute time

  • Relatively fewer pharmacophoric fingerprints than 2D fingerprints

    • Result: Low similarity values using Tanimoto

Pharmacophore l.jpg
Pharmacophore Searching

  • A structural abstraction of the interactions between various functional group types in a compound

  • Described by a spatial representation of these groups as centers (or vertices) of geometrical polyhedra, together with pairwise distances between centers


3d alignment methods l.jpg
3D: Alignment Methods Searching

  • Require consideration of the degrees of freedom related to the conformational flexibility of the molecules

  • Goal: determine the alignment where similarity measure is at a maximum

3d field based alignment methods l.jpg
3D: Field-Based Alignment Methods Searching

  • Consideration of the electron density of the molecules

    • Requires quantum mechanical calculation: costly

    • Property not sufficiently discriminatory

3d gnomonic projection methods l.jpg
3D: Gnomonic Projection Methods Searching

  • Molecule positioned at the center of a sphere and properties projected on the surface

  • Sphere approximated by a tessellated icosahedron or dodecahedron

  • Each triangular face is divided into a series of smaller triangles

Finding the optimal alignment l.jpg
Finding the Optimal Alignment Searching

  • Need a mechanism for exploring the orientational (and conformational) degrees of freedon for determining the optimal alignment where the similarity is maximized

  • Methods: simplex algorithm, Monte Carlo methods, genetic alrogithms

Evaluation of similarity methods l.jpg
Evaluation of Similarity Methods Searching

  • Generally, 2D methods are more effective that 3D

    • 2D methods may be artificially enhanced because of database characteristics (close analogs)

    • Incomplete handling of conformational flexibility in 3D databases

  • Best to use data fusion techniques, combining methods

For additional information l.jpg
For additional information . . . Searching

  • See Dr. John Barnard’s lecture at: