1 / 70

Toxicological Relationships Between Proteins Obtained From a Molecular Spam Filter

Toxicological Relationships Between Proteins Obtained From a Molecular Spam Filter. Florian Nigsch & John Mitchell. F. Nigsch, et al ., J. Chem. Inf. Model., 48 , 306-318 (2008) F. Nigsch, et al ., Toxicology and Applied Pharmacology , 231 , 225-234 (2008)

Download Presentation

Toxicological Relationships Between Proteins Obtained From a Molecular Spam Filter

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Toxicological Relationships Between Proteins Obtained Froma Molecular Spam Filter Florian Nigsch & John Mitchell F. Nigsch, et al., J. Chem. Inf. Model.,48, 306-318 (2008) F. Nigsch, et al., Toxicology and Applied Pharmacology, 231, 225-234 (2008) F. Nigsch, et al., J. Chem. Inf. Model.,48, 2313-2325 (2008)

  2. Toxicological Relationships Between Proteins Obtained Froma Molecular Spam Filter Florian Nigsch & John Mitchell F. Nigsch, et al., J. Chem. Inf. Model.,48, 306-318 (2008) F. Nigsch, et al., Toxicology and Applied Pharmacology, 231, 225-234 (2008) F. Nigsch, et al., J. Chem. Inf. Model.,48, 2313-2325 (2008)

  3. Toxicological Relationships Between Proteins Obtained Froma Molecular Spam Filter Florian Nigsch & John Mitchell Now at Novartis Institutes, Boston

  4. Toxicological Relationships Between Proteins Obtained Froma Molecular Spam Filter Florian Nigsch & John Mitchell Soon moving to University of St Andrews

  5. Spam • Unsolicited (commercial) email • Approx. 90% of all email traffic is spam • Where are the legitimate messages? • Filtering

  6. Analogy to Drug Discovery • Huge number of possible candidates • Virtual screening to help in selection process

  7. High affinity to protein target Soluble Permeable Absorbable High bioavailability Specific rate of metabolism Renal/hepatic clearance? Volume of distribution? Low toxicity Plasma protein binding? Blood-Brain-Barrier penetration? Dosage (once/twice daily?) Synthetic accessibility Formulation (important in development) Properties of Drugs

  8. Multiobjective Optimisation Synthetic accessibility Bioactivity Solubility Toxicity Permeability Metabolism Huge number of candidates …

  9. Multiobjective Optimisation Synthetic accessibility Bioactivity Drug Solubility Toxicity U S E L E S S Permeability Metabolism Huge number of candidates … most of which are useless!

  10. Winnow Algorithm • Invented in late 1980s by Nick Littlestone to learn Boolean functions • Name from the verb “to winnow” • High-dimensional input data • Natural Language Processing (NLP), text classification, bioinformatics • Different varieties (regularised, Sparse Network Of Winnow - SNOW, …) • Error-driven, linear threshold, online algorithm

  11. Winnow Algorithm • Invented in late 1980s by Nick Littlestone to learn Boolean functions • Name from the verb “to winnow” • High-dimensional input data • Natural Language Processing (NLP), text classification, bioinformatics • Different varieties (regularised, Sparse Network Of Winnow - SNOW, …) • Error-driven, linear threshold, online algorithm

  12. Winnow Algorithm • Invented in late 1980s by Nick Littlestone to learn Boolean functions • Name from the verb “to winnow” • High-dimensional input data • Natural Language Processing (NLP), text classification, bioinformatics • Different varieties (regularised, Sparse Network Of Winnow - SNOW, …) • Error-driven, linear threshold, online algorithm

  13. Winnow Algorithm • Invented in late 1980s by Nick Littlestone to learn Boolean functions • Name from the verb “to winnow” • High-dimensional input data • Natural Language Processing (NLP), text classification, bioinformatics • Different varieties (regularised, Sparse Network Of Winnow - SNOW, …) • Error-driven, linear threshold, online algorithm

  14. Winnow Algorithm • Invented in late 1980s by Nick Littlestone to learn Boolean functions • Name from the verb “to winnow” • High-dimensional input data • Natural Language Processing (NLP), text classification, bioinformatics • Different varieties (regularised, Sparse Network Of Winnow - SNOW, …) • Error-driven, linear threshold, online algorithm

  15. Feature Space - Chemical Space m = (f1,f2,…,fn) f3 f3 f2 COX2 CDK2 f1 Feature spaces of high dimensionality CDK1 f2 DHFR f1

  16. Combinations of Features Combinations of molecular features to account for synergies.

  17. Features of Molecules Based on circular fingerprints

  18. Training Example

  19. Workflow For predicting protein targets

  20. Protein Target Prediction • Which protein does a given molecule bind to? • Virtual Screening • Multiple endpoint drugs - polypharmacology • New targets for existing drugs • Prediction of adverse drug reactions (ADR) • Computational toxicology

  21. Protein Target Prediction • Which protein does a given molecule bind to? • Virtual Screening • Multiple endpoint drugs - polypharmacology • New targets for existing drugs • Prediction of adverse drug reactions (ADR) • Computational toxicology

  22. Protein Target Prediction • Which protein does a given molecule bind to? • Virtual Screening • Multiple endpoint drugs - polypharmacology • New targets for existing drugs • Prediction of adverse drug reactions (ADR) • Computational toxicology

  23. Protein Target Prediction • Which protein does a given molecule bind to? • Virtual Screening • Multiple endpoint drugs - polypharmacology • New targets for existing drugs • Prediction of adverse drug reactions (ADR) • Computational toxicology

  24. Protein Target Prediction • Which protein does a given molecule bind to? • Virtual Screening • Multiple endpoint drugs - polypharmacology • New targets for existing drugs • Prediction of adverse drug reactions (ADR) • Computational toxicology

  25. Predicted Protein Targets • Selection of 233 classes from the MDL Drug Data Report • ~90,000 molecules • 15 independent 50%/50% splits into training/test set

  26. Predicted Protein Targets Cumulative probability of correct prediction within the three top-ranking predictions: 82.1% (±0.5%)

  27. Model for target prediction Annotated library of toxic molecules MDL Toxicity database ~150,000 molecules Standardisation MySQL database For each molecule we predict the likely target Correlations between predicted protein targets and known toxicity codes Canonical (23) Full (490) Computational Toxicology

  28. Toxicological Relationships Outline (1) • Protein target prediction allows us to link (predictively) 150,000 toxic organic molecules to 233 specific protein targets • Each target is treated as a single protein, although may be sets of related proteins) • Toxicological databases link (experimentally) these 150,000 molecules to 23 toxicity classes • Combining these two sources of data matches the 233 proteins with the 23 toxicity classes

  29. Toxicological Relationships Outline (1) • Protein target prediction allows us to link (predictively) 150,000 toxic organic molecules to 233 specific protein targets • Each target is treated as a single protein, although may be sets of related proteins • Toxicological databases link (experimentally) these 150,000 molecules to 23 toxicity classes • Combining these two sources of data matches the 233 proteins with the 23 toxicity classes

  30. Toxicological Relationships Outline (1) • Protein target prediction allows us to link (predictively) 150,000 toxic organic molecules to 233 specific protein targets • Each target is treated as a single protein, although may be sets of related proteins • Toxicological databases link (experimentally) these 150,000 molecules to 23 toxicity classes • Combining these two sources of data matches the 233 proteins with the 23 toxicity classes

  31. Toxicological Relationships Outline (1) • Protein target prediction allows us to link (predictively) 150,000 toxic organic molecules to 233 specific protein targets • Each target is treated as a single protein, although may be sets of related proteins • Toxicological databases link (experimentally) these 150,000 molecules to 23 toxicity classes • Combining these two sources of data matches the 233 proteins with the 23 toxicity classes

  32. Toxicological Relationships Outline (2) • For each protein target, we have a profile of association with the 23 toxicity classes • Proteins with similar profiles are clustered together • We demonstrate that these clusters of proteins can be physiologically meaningful.

  33. Toxicological Relationships Outline (2) • For each protein target, we have a profile of association with the 23 toxicity classes • Proteins with similar profiles are clustered together • We demonstrate that these clusters of proteins can be physiologically meaningful.

  34. Toxicological Relationships Outline (2) • For each protein target, we have a profile of association with the 23 toxicity classes • Proteins with similar profiles are clustered together • We demonstrate that these clusters of proteins can be physiologically meaningful.

  35. Predictions Obtained Highest ranking class IS predicted protein target Protein code j Target Prediction L70 - Changes in liver weight<Liver Y07 - Hepatic microsomal oxidase<Enzyme inhibition M30 - Other changes<Kidney, Urether, and Bladder L30 - Other changes<Liver Toxicity codesi Result matrix R = (rij) rij incremented for each prediction. Protein targets Toxcodes ( ) … r11 r12 r21

  36. Toxicity Annotations FULL TOXICITY CODES (490) Y41 : Glycolytic < Metabolism (intermediary) < Biochemical CANONICAL TOXICITY CODES (23)

  37. Cardiac - G Kainic acid receptor Adrenergic alpha2 Phosphodiesterase III cAMP Phosphodiesterase O6-Alkylguanine-DNA alkyltransferase Vascular - H Angiotensin II AT2 Dopamine (D2) Bombesin Adrenergic alpha2 5-HT antagonist Proteins by Toxicity

  38. Top 5 Proteins by Toxicity 68 distinct proteins for 23 toxicity classes, i.e., 3.0 proteins per canonical toxicity code. Lanosterol 14alpha-Methyl Demethylase 5 Glucose-6-phosphate Translocase 4 IL-6 4 Benzodiazepine Antagonist 3 Kainic Acid Receptor 3 Proteins and their connectivities

  39. Clustering of Toxicity Classes Clustering of toxicity classes: based on predicted protein associations from the result matrix

  40. Correlation Between Toxicity Classes Correlations between toxicity classes: 23 by 23 correlation matrix

  41. Correlation Between Proteins Correlations between proteins:233 by 233 correlation matrix

  42. Correlation Between Proteins Correlations between proteins: 233 by 233 correlation matrix Cluster 1 (proteins 6-11)

  43. We will look at two specific clusters, which are called Cluster 1 and Cluster 4.

  44. Carbonic Anhydrase Inhibitor Estrogen Receptor Modulator LHRH Agonist Aromatase Inhibitor Cysteine Protease Inhibitor DHFR Inhibitor Cluster 1 • Cluster 1 (proteins 6-11) • Within-cluster correlation (without auto-correlation) r = 0.95

  45. Carbonic Anhydrase Inhibitor Estrogen Receptor Modulator LHRH Agonist Aromatase Inhibitor Cysteine Protease Inhibitor DHFR Inhibitor Cluster 1 • Cluster 1 (proteins 6-11) • Within-cluster correlation (without auto-correlation) r = 0.95

  46. Carbonic Anhydrase Inhibitor Estrogen Receptor Modulator LHRH Agonist Aromatase Inhibitor Cysteine Protease Inhibitor DHFR Inhibitor Cluster 1 Cluster 1 • Within-cluster correlation (without auto-correlation) r = 0.95 Proteins involved in breast cancer

  47. Cluster 1 Proteins involved in breast cancer

More Related