1 / 26

Chemoinformatics tools for lead discovery

Chemoinformatics tools for lead discovery. Virtual screening. The huge numbers of molecules available in public and in-house databases means that there is a requirement for tools to rank compounds in order of decreasing probability of activity

afra
Download Presentation

Chemoinformatics tools for lead discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chemoinformatics tools for lead discovery

  2. Virtual screening • The huge numbers of molecules available in public and in-house databases means that there is a requirement for tools to rank compounds in order of decreasing probability of activity • Range of methods available, varying in the sophistication and the amount of information that is available • Use of structure-based methods when an X-ray structure for the biological target is available • If this is not the case then must make use of information about (potential) ligands

  3. Ligand-Based Methods • Similarity searching • Use when just a single bioactive reference structure is available • 3D pharmacophore searching • Use when it has been possible to carry out a pharmacophore mapping exercise • Machine learning • Use when a fair number of both actives and inactives have been identified

  4. Similarity Searching: I • Use of a similarity measure to quantify the resemblance between an active target, or reference, structure and each database structure • The similar property principle means that high-ranked structures are likely to have similar activities to that of the target structure • Similarity searching hence provides an obvious way of following-up on an initial active

  5. Similarity searching: II • Many ways in which the similarity between two molecules can be computed • A similarity measure has two components • A structure representation • A similarity coefficient to compare two representations • Most operational systems use similarity measures based on 2D fingerprints and the Tanimoto coefficient

  6. Fragment bit-strings (fingerprints) • Originally developed for 2D substructure search • Similarity is based on the fragments common to two molecules • Widely used in both in-house and commercial chemoinformatics systems

  7. Similarity coefficients • Tanimoto coefficient for binary bit strings • C bits set in common between Target and Database Structure • T bits set in Target • D bits set in Database structure • Values between zero (no bits in common) and unity (identical fingerprints) • Many other, related similarity coefficients exist: • Tversky, cosine, Euclidean distance …..

  8. Combination of search techniques using data fusion: I • Tanimoto/fingerprint measures most common but many other types, e.g., • Computed physicochemical properties • 3D grid describing the molecular electrostatic potential • These reflect different molecular characteristics, so may enhance search performance by using more than one similarity measure • Data fusion or consensus scoring

  9. Combination of search techniques using data fusion: II • Combination of different rankings of the same sets of molecules • Two basic approaches • Generate rankings from the same molecule using different similarity measures (similarity fusion) • Generate rankings from different molecules using the same similarity measure but different molecules (group fusion)

  10. Reference 2 Reference 3 Groupfusion Reference 1

  11. After truncation to required rank Reference 2 Reference 1 Reference 3

  12. Fused Group Fusion Final truncated r = 1000 r = 2000 New Active Active found in earlier list

  13. Group fusion rules • Useful performance increases, even with just 10 actives, as better coverage of structural space with multiple starting points • Improvement most obvious when searching for heterogeneous sets of active molecules • Best results obtained by • Fusing similarity coefficient values, rather than ranks • Re-ranking using the maximum of the similarity values associated with each molecule • Using the Tanimoto coefficient

  14. Turbo similarity searching: I • Similar property principle: nearest neighbours are likely to exhibit the same activity as the reference structure • Group fusion improves the identification of active compounds • Potential for further enhancements by group fusion of rankings from the reference structure and from its assumed active nearest neighbours

  15. Turbo similarity searching: II REFERENCE STRUCTURE RANKED LIST NEAREST NEIGHBOURS

  16. Experimental details • MDL Drug Data report (MDDR) dataset of 11 activity classes and 102K structures • In all, 8294 actives in the 11 classes, with (turbo) similarity searches being carried out using each of these as the reference structure • ECFP_4 fingerprints/Tanimoto coefficient • MAX group fusion on similarity scores • Increasing numbers of nearest neighbours

  17. Numbers of nearest neighbours

  18. Upper and lower bound experiments

  19. Rationale for upper bound results • The true actives in the set of assumed actives yield significant enhancements in performance • The true inactives in the set of assumed actives have little effect on performance • Taken together, the two groups of compounds yield the observed net enhancement

  20. Use of machine-learning methods for similarity searching: I • Turbo similarity searching uses group fusion to enhance conventional similarity searching • Machine learning is a more powerful virtual screening tool than similarity searching • But requires a training-set containing known actives and inactives • Given an active reference structure, a training-set can be generated from • Using the k nearest neighbours of the reference structure as the actives • Using k randomly chosen, low-similarity compounds as the inactives

  21. Use of machine-learning methods for similarity searching: II

  22. Results: I • Experiments with the MDDR dataset show that group fusion better than machine-learning methods when averaged over all of the classes • However, group fusion inferior for the most diverse datasets (as measured by the mean pair-wise similarities) • Additional searches using 10 MDDR activity classes that are as structurally diverse as possible

  23. Results: II

  24. Conclusions: I • Fingerprint-based similarity searching using a known reference structure is long-established in chemoinformatics • When small numbers of actives are available, group fusion will enhance performance when the sought actives are structurally heterogeneous

  25. Conclusions: II • Can also enhance conventional similarity search, even if there is just a single active, by assuming that the nearest neighbours are also active • Can be effected in two ways • Use of group fusion to combine similarity rankings (overall best approach) • Use of substructural analysis to compute fragment weights (best with highly heterogeneous sets of actives)

  26. Soaluntukdipelajari • Tunjukkanperankhemoinformatikdalam QSAR • Data dananalisisdarikhemoinformatik yang banyakdigunakandalam docking molekul • Indekskemiripan (similarity index) banyakdigunakanuntukmendapatkaninformasitentangsenyawabaru yang memilikiaktivitasbiologistinggi. Jelaskansecarasingkatsistemkerjanya • Dalampenemuanobatbaru yang lebihpotensialdari yang sudahdikenal, banyakmemanfaatkankhemoinformatiks. Jelaskandenganbeberapacontoh. • ApaperbedaanpenggunaanKhemoinformatiksdalam QSAR, molecular docking dan similarity searching?

More Related