1 / 41

Use of Machine Learning in Chemoinformatics

Use of Machine Learning in Chemoinformatics. Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course. Major Aspects of Chemoinformatics. Databases: Development of databases for storage and retrieval of small molecule structures and their properties.

ellis
Download Presentation

Use of Machine Learning in Chemoinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course

  2. Major Aspects of Chemoinformatics • Databases: Development of databases for storage and retrieval of small molecule structures and their properties. • Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. • Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

  3. Machine Learning

  4. Machine learning classifiers

  5. Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure

  6. Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure

  7. Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure

  8. Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure

  9. Machine Learning

  10. Machine Learning QSAR Virtual Screening Clustering Classification Molecular Structures Properties Molecular Descriptors

  11. Different descriptor types • Simple feature counts (such as number of rotatable bonds or molecular weight) • Fragmental descriptors which indicate the presence or absence (or count) of groups of atoms and substructures • Physicochemical properties (density, solubility, vdWaals volume) • Topological indices (size, branching, overall shape)

  12. Major Aspects of Chemoinformatics • Databases: Development of databases for storage and retrieval of small molecule structures and their properties. • Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. • Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

  13. Quantitative Structure-Activity Relationships (QSAR) In QSAR models structural parameters (descriptors) are fitted to experimental data for biological activity (or another given property, P)

  14. Prediction of Solubility, ADME & Toxicity

  15. hERG Classification with SVM

  16. Evaluation of the data set

  17. Performance of SVM

  18. Performance of SVM

  19. Virtual screening • Computational techniques for a rapid assessment of large libraries of chemical structures in order to guide the selection of likely drug candidates.

  20. Similarity Search • Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity. • Thus, molecules that are located closely together in the chemical space are often considered to be functionally related.

  21. Fingerprints-based Similarity Search • widely used similarity search tool • consists of descriptors encoded as bit strings • Bit strings of query and database are compared using similarity metric such as Tanimoto coefficient • MACCS fingerprints: 166 structural keys • that answer questions of the type: • Is there a ring of size 4? • Is at least one F, Br, Cl, or I present? • where the answer is either • TRUE (1) or FALSE (0)

  22. Tanimoto Similarity or 90% similarity

  23. Similarity Search

  24. Questions?

  25. Molecular editors and viewers http://www.chemaxon.com/products/marvin/

  26. Molecular editors and viewers http://jmol.sourceforge.net/

  27. Format conversion http://cactus.nci.nih.gov/translate/

More Related