Machine learning in drug design
This presentation is the property of its rightful owner.
Sponsored Links
1 / 57

Machine Learning in Drug Design PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on
  • Presentation posted in: General

Machine Learning in Drug Design. David Page Dept. of Biostatistics and Medical Informatics and Dept. of Computer Sciences. Michael Waddell Paul Finn Ashwin Srinivasan John Shaughnessy Bart Barlogie. Frank Zhan Stephen Muggleton Arno Spatola Sean McIlwain Brian Kay. Collaborators.

Download Presentation

Machine Learning in Drug Design

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Machine learning in drug design

Machine Learning in Drug Design

David Page

Dept. of Biostatistics and Medical Informatics and Dept. of Computer Sciences


Collaborators

Michael Waddell

Paul Finn

Ashwin Srinivasan

John Shaughnessy

Bart Barlogie

Frank Zhan

Stephen Muggleton

Arno Spatola

Sean McIlwain

Brian Kay

Collaborators


Outline

Outline

  • Overview of Drug Design

  • How Machine Learning Fits Into the Process

  • Target Search: Single Nucleotide Polymorphisms (SNPs)

  • Machine Learning from Feature Vectors

    • Decision Trees

    • Support Vector Machines

    • Voting/Ensembles

  • Predicting Molecular Activity: Learning from Structure


Drugs typically are

Drugs Typically Are…

  • Small organic molecules that…

  • Modulate disease by binding to some target protein…

  • At a location that alters the protein’s behavior (e.g., antagonist or agonist).

  • Target protein might be human (e.g., ACE for blood pressure) or belong to invading organism (e.g., surface protein of a bacterium).


Example of binding

Example of Binding


So to design a drug

So To Design a Drug:

Identify Target

Protein

Knowledge of proteome/genome

Relevant biochemical pathways

Crystallography, NMR

Difficult if Membrane-Bound

Determine

Target Site

Structure

Synthesize a

Molecule that

Will Bind

Imperfect modeling of structure

Structures may change at binding

And even then…


Molecule binds target but may

Molecule Binds Target But May:

  • Bind too tightly or not tightly enough.

  • Be toxic.

  • Have other effects (side-effects) in the body.

  • Break down as soon as it gets into the body, or may not leave the body soon enough.

  • It may not get to where it should in the body (e.g., crossing blood-brain barrier).

  • Not diffuse from gut to bloodstream.


And every body is different

And Every Body is Different:

  • Even if a molecule works in the test tube and works in animal studies, it may not work in people (will fail in clinical trials).

  • A molecule may work for some people but not others.

  • A molecule may cause harmful side-effects in some people but not others.


Outline1

Outline

  • Overview of Drug Design

  • How Machine Learning Fits Into the Process

  • Target Search: Single Nucleotide Polymorphisms (SNPs)

  • Machine Learning from Feature Vectors

    • Decision Trees

    • Support Vector Machines

    • Voting/Ensembles

  • Predicting Molecular Activity: Learning from Structure


Places to use machine learning

Places to use Machine Learning

  • Finding target proteins.

  • Inferring target site structure.

  • Predicting who will respond positively/negatively.


Places to use machine learning1

Places to use Machine Learning

  • Finding target proteins.

  • Inferring target site structure.

  • Predicting who will respond positively/negatively.


Healthy vs disease

Healthy vs. Disease

Healthy

Diseased


If we could sequence dna quickly and cheaply we could

If We Could Sequence DNA Quickly and Cheaply, We Could:

  • Sequence DNA of people taking a drug, and use ML to identify consistent differences between those who respond well and those who do not.

  • Sequence DNA of cancer cells and healthy cells, and use ML to detect dangerous mutations… proteins these genes code for may be useful targets.

  • Sequence DNA of people who get a disease and those who don’t, and use ML to determine genes related to succeptibility… proteins these genes code for may be useful targets.


Problem can t sequence quickly

Problem: Can’t Sequence Quickly

  • Can quickly test single positions where variation is common: Single Nucleotide Polymorphisms (SNPs).

  • Can quickly test degree to which every gene is being transcribed: Gene Expression Microarrays (e.g., Affymetrix Gene Chips™).

  • Can (moderately) quickly test which proteins are present in a sample (Proteomics).


Outline2

Outline

  • Overview of Drug Design

  • How Machine Learning Fits Into the Process

  • Target Search: Single Nucleotide Polymorphisms (SNPs)

  • Machine Learning from Feature Vectors

    • Decision Trees

    • Support Vector Machines

    • Voting/Ensembles

  • Predicting Molecular Activity: Learning from Structure


Example of snp data

Example of SNP Data


Problem snps are not genes

Problem: SNPs are not Genes

  • If we find a predictive SNP, it may not be part of a gene… we can only infer that the SNP is “near” a gene that may be involved in the disease.

  • Even if the SNP is part of a gene, it may be another nearby gene that is the key gene.


Problem even snps are costly

Problem: Even SNPs are Costly

  • Typically cannot use all known SNPs.

  • Can focus on a particular chromosome and area if knowledge permits that.

  • Can use a scattering of SNPs, since SNPs that are very close together may be redundant… use one SNP per haplotype block, or region where recombination is rare.


Why machine learning

Why Machine Learning?

  • There may be no single SNP in our data that distinguishes disease vs. healthy.

  • Still may be possible to have some combination of SNPs to predict. Can gain insight from this combination.


Outline3

Outline

  • Overview of Drug Design

  • How Machine Learning Fits Into the Process

  • Target Search: Single Nucleotide Polymorphisms (SNPs)

  • Machine Learning from Feature Vectors

    • Decision Trees

    • Support Vector Machines

    • Voting/Ensembles

  • Predicting Molecular Activity: Learning from Structure


Decision trees in one picture

Decision Trees in One Picture


Na ve bayes in one picture

Naïve Bayes in One Picture

Age

SNP 3000

SNP 1

SNP 2

. . .


Voting approach

Voting Approach

  • Score SNPs using information gain.

  • Choose top 1% scoring SNPs.

  • To classify a new case, let these SNPs vote (majority or weighted majority vote).

  • We use majority vote here.


Task predict early onset disease from snp data

Task: Predict Early Onset DiseaseFrom SNP Data

  • Only 3000 SNPs, coarsely sampled over entire genome.

  • 80 patients (examples), 40 with early onset.

  • Using technology from Orchid.

  • Can a predictor be learned that performs significantly better than chance on unseen data?


Results

Results

  • Use all data, only top 1% of features, or only top 10% of features (according to decision tree’s purity measure).

  • Use Trees, SVMs, Voting.

  • SVMs with top 10% achieve 71% accuracy. Significantly better than chance (50%).


Lessons

Lessons

  • Feature selection is important for performance.

  • Methodology note for machine learning specialists: must repeat this entire process on each fold of cross-validation or results will be overly-optimistic.

  • SNP approach is promising… get funding to measure more SNPs.

  • More work on SVM comprehensibility.


Outline4

Outline

  • Overview of Drug Design

  • How Machine Learning Fits Into the Process

  • Target Search: Single Nucleotide Polymorphisms (SNPs)

  • Machine Learning from Feature Vectors

    • Decision Trees

    • Support Vector Machines

    • Voting/Ensembles

  • Predicting Molecular Activity: Learning from Structure


Places to use machine learning2

Places to use Machine Learning

  • Finding target proteins.

  • Inferring target site structure.

  • Predicting who will respond positively/negatively.


Typical practice when target structure is unknown

Typical Practice when Target Structure is Unknown

  • Test many molecules (1,000,000) to find some that bind to target (ligands).

  • Infer (induce) shape of target site from 3D structural similarities.

  • Shared 3D substructure is called a pharmacophore.

  • Perfect example of a machine learning task with spatial target.


An example of structure learning

An Example of Structure Learning

Inactive

Active


Inductive logic programming

Inductive Logic Programming

  • Represents data points in mathematical logic

  • Uses Background Knowledge

  • Returns results in logic


The logical representation of a pharmacophore

The Logical Representation of a Pharmacophore


Background knowledge i

Background Knowledge I

  • Information about atoms and bonds in the molecules

  • atm(m1,a1,o,3,5.915800,-2.441200,1.799700).

  • atm(m1,a2,c,3,0.574700,-2.773300,0.337600).

  • atm(m1,a3,s,3,0.408000,-3.511700,-1.314000).

  • bond(m1,a1,a2,1).

  • bond(m1,a2,a3,1).


Background knowledge ii

Background knowledge II

  • Definition of distance equivalence

  • dist(Drug,Atom1,Atom2,Dist,Error):-

  • number(Error),

  • coord(Drug,Atom1,X1,Y1,Z1),

  • coord(Drug,Atom2,X2,Y2,Z2),

  • euc_dist(p(X1,Y1,Z1),p(X2,Y2,Z2),Dist1),

  • Diff is Dist1-Dist,

  • absolute_value(Diff,E1),

  • E1 =< Error.

  • euc_dist(p(X1,Y1,Z1),p(X2,Y2,Z2),D):-

  • Dsq is (X1-X2)^2+(Y1-Y2)^2+(Z1-Z2)^2,

  • D is sqrt(Dsq).


Central idea generalize by searching a lattice

Central Idea: Generalize by searching a lattice


Conformational model

Conformational model

  • Conformational flexibility modelled as multiple conformations:

    • Sybyl randomsearch

    • Catalyst


Pharmacophore description

Pharmacophore description

  • Atom and site centred

    • Hydrogen bond donor

    • Hydrogen bond acceptor

    • Hydrophobe

    • Site points (limited at present)

    • User definable

  • Distance based


Example 1 dopamine agonists

Example 1: Dopamine agonists

  • Agonists taken from Martin data set on QSAR society web pages

  • Examples (5-50 conformations/molecule)


Pharmacophore identified

Pharmacophore identified

  • Molecule A has the desired activity if:

  • in conformation B molecule A contains a hydrogen acceptor at C, and

  • in conformation B molecule A contains a basic nitrogen group at D, and

  • the distance between C and D is 7.05966 +/- 0.75 Angstroms, and

  • in conformation B molecule A contains a hydrogen acceptor at E, and

  • the distance between C and E is 2.80871 +/- 0.75 Angstroms, and

  • the distance between D and E is 6.36846 +/- 0.75 Angstroms, and

  • in conformation B molecule A contains a hydrophobic group at F, and

  • the distance between C and F is 2.68136 +/- 0.75 Angstroms, and

  • the distance between D and F is 4.80399 +/- 0.75 Angstroms, and

  • the distance between E and F is 2.74602 +/- 0.75 Angstroms.


Example ii ace inhibitors

Example II: ACE inhibitors

  • 28 angiotensin converting enzyme inhibitors taken from literature

    • D. Mayer et al., J. Comput.-Aided Mol. Design, 1, 3-16, (1987)


Experiment 1

Experiment 1

  • Attempt to identify pharmacophore using original Mayer et al. Data (final conformations).

  • Initial failed attempt traced to “bugs” in background knowledge definition.

  • 4 pharmacophores found with corrected code (variations on common theme)


Ace pharmacophore

ACE pharmacophore

  • Molecule A is an ACE inhibitor if:

  • molecule A contains a zinc-site B,

  • molecule A contains a hydrogen acceptor C,

  • the distance between B and C is 7.899 +/- 0.750 A,

  • molecule A contains a hydrogen acceptor D,

  • the distance between B and D is 8.475 +/- 0.750 A,

  • the distance between C and D is 2.133 +/- 0.750 A,

  • molecule A contains a hydrogen acceptor E,

  • the distance between B and E is 4.891 +/- 0.750 A,

  • the distance between C and E is 3.114 +/- 0.750 A,

  • the distance between D and E is 3.753 +/- 0.750 A.


Pharmacophore discovered

B

A

C

Pharmacophore discovered

Zinc site

H-bond acceptor


Experiment 2

Experiment 2

  • Definition of “zinc ligand” added to background knowledge

    • based on crystallographic data

  • Multiple conformations

    • Sybyl RandomSearch


Experiment 21

4.0

3.9

7.3

Experiment 2

  • Original pharmacophore rediscovered plus one other

    • different zinc ligand position

    • similar to alternative proposed by Ciba-Geigy


Example iii thermolysin inhibitors

Example III: Thermolysin inhibitors

  • 10 inhibitors for which crystallographic data is available in PDB

  • Conformationally challenging molecules

  • Experimentally observed superposition


Key binding site interactions

Key binding site interactions

Asn112-NH

O=C Asn112

S2’

Arg203-NH

S1’

O=C Ala113

Zn


Interactions made by inhibitors

Interactions made by inhibitors


Pharmacophore identification

Pharmacophore Identification

  • Structures considered 1HYT 1THL 1TLP 1TMN 2TMN 4TLN 4TMN 5TLN 5TMN 6TMN

  • Conformational analysis using “Best” conformer generation in Catalyst

  • 98-251 conformations/molecule


Thermolysin results

Thermolysin Results

  • 10 5-point pharmacophore identified, falling into 2 groups (7/10 molecules)

    • 3 “acceptors”, 1 hydrophobe, 1 donor

    • 4 “acceptors, 1 donor

  • Common core of Zn ligands, Arg203 and Asn112 interactions identified

  • Correct assignments of functional groups

  • Correct geometry to 1 Angstrom tolerance


Thermolysin results1

Thermolysin results

  • Increasing tolerance to 1.5Angstroms finds common 6-point pharmacophore including one extra interaction


Example iv antibacterial peptides

Example IV: Antibacterial peptides

  • Dataset of 11 pentapeptides showing activity against Pseudomonas aeruginosa

    • 6 actives <64mg/ml IC50

    • 5 inactives


Pharmacophore identified1

Pharmacophore Identified

A Molecule M is active against Pseudomonas Aeruginosa

if it has a conformation B such that:

M has a hydrophobic group C,

M has a hydrogen acceptor D,

the distance between C and D in conformation B is 11.7 Angstroms

M has a positively-charged atom E,

the distance between C and E in conformation B is 4 Angstroms

the distance between D and E in conformation B is 9.4 Angstroms

M has a positively-charged atom F,

the distance between C and F in conformation B is 11.1 Angstroms

the distance between D and F in conformation B is 12.6 Angstroms

the distance between E and F in conformation B is 8.7 Angstroms

Tolerance 1.5 Angstroms


Ongoing ilp developments pharmacophores

Ongoing ILP developments (pharmacophores)

  • Continue to extend method validation

  • Extending to combinatorial mixtures

  • Quantitative models

  • Mixing different datatypes in background knowledge

  • Developing graphical front-end


Ongoing developments other

Ongoing developments (Other)

  • Analysis of HTS datasets

  • Analysis of “drug-likeness”

  • Derivation of new descriptors

    • eg Empirical binding functions


  • Login