Genetic Algorithm for de novo Design

A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006

Method and applications • Goal • Using genetic algorithm in ADAPT to search novel small molecules for combinatorial library generation • Method • Initial generation • Fitness function • Breeding next generation • Applications • Catheprin D – small chemical space, ligand unknown • Dihydrofolate reductase – larger chemical space, ligand known • HIV 1 RT – reproduce known structures • Questions

Basic genetic algorithm Goal and algorithm choice Goal is to “develop new ligands using information from the three dimensional (3D) structure of a protein target without the prior knowledge of other ligands” Challenge is location of bio-active drugs in complete chemical space is sparse, non-contiguous and difficult to predict a priori Strategies already tried -Find fragments that fit in some part of active site and link multiple fragments together -Find a fragment that fits in some part of active site and grow in a particular direction Genetic algorithm is better -Good for searching large part of chemical space quickly -Good for adequate not best solution -Works even when fitness/scoring functions are not known exactly -Works with “whole” molecule properties (ADME) -Generates ensemble solutions as leads

Basic genetic algorithm Important steps Initial generation Start with acyclic graph of at most16 fragments with at most 8 connections in SMILES notation -Generate diverse set by picking a random fragment and adding random fragments at random positions -Generate user defined set by swapping at most 2 fragments from user defined graph randomly Fitness pressure Evaluate the fitness value for each compound using DOCK 4.0 program with 6-12 Van der Waals and 1/r electrostatic terms, Daylight’s clogp program, molecular weight, number of rotatable bonds and number of hydrogen bond donor/acceptors -40 -35 more fit Select best scoring compounds as parents for the next generation which may or may not include the parents Breeding Crossover from parents happens by randomly swapping nodes of equal or unequal sizes generated from random walks Mutations of daughter occurs with user defined mutation probability with respect to identity or connectivity New generation is created, optionally diversity is added and process is cycled until the fitness goal is reached. + equal/unequal mutation single/multiple crossover

Applications • Catheprin D • Compare results of ADAPT applied to a combinatorial library with experimental binding constant data on the library • Able to select fragments consistently present in best inhibitors tested experimentally • Unable to directly produce known inhibitors due to differences in DOCK score functions and binding constant surface • Dihydrofolate reductase (DHFR) • Study the effect of seeding with a known ligand, methoxtrexate in this case and adding diversity to longer runs in a larger chemical space search (108 compounds) • Able to evolve compounds with motif of known ligands • Able to do so faster when seeded with a known bioactive ligand • Able to do so efficiently in one long run by adding diversity than in multiple short runs • HIV 1 reverse transcriptase • Rediscover specific structural themes of ligands that bind to this active site • -Able to reproduce four known inhibitors in “buttefly-like” shape (out of 26?) • -Able to reproduce a PETT variant inhibitor like MSC-127 which was experimentally discovered by testing 750 PETT variants

Catheprin D Setup Experimentally studied ligands 10x10x10 =1000 Size of potential chemical space 25 frag 3 sites = 15625 Performed 10 runs of 50 generations each

Catheprin D Results Experimentally studied ligands 10x10x10 =1000 Size of potential chemical space 25 frag 3 sites = 15625 Size of library generated by ADAPT 8x7x7 = 329 4/7 inhibitors with 100 nM and 0/23 inhibitors with 330 nM activity found in the ADAPT library Experimental data only exists for 24/392 compounds in ADAPT library DOCK only fitness function does not accurately map the binding constant surface

DHFR Setup 17 fragments from methotrexate + other = 32 total fragments 3-13 fragments allowed per compound Size of possible chemical space = 3.5 x 108 unique compounds 1 set of 10 runs to 30 generations – methotrexate seeded 1 set of 10 runs to 30 generations – unseeded 1 set of 10 runs to 100 generations – unseeded 1 run of 1000 generations with diversity every 200 generations 5 runs of 200 generations

DHFR Results 94% of solutions in seeded results better than seed 0% of solutions in 30 generation unseeded results better 28% of solutions in 100 generation unseeded results better 96/98 structures in seeded runs contained pteridine frag. 21/100 structures in 30 gen. and 56/100 structures in 100 gen. unseeded run contained pteridine fragment Fitness score for 1000 generation run was better than 5 200 generation runs.

HIV 1 RT Setup HIV 1 RT model bound to HEPT in butterfly shape characterized by sphgen & GRID 65 fragments from 26 inhibitors with 4 -12 fragments per compound for 5 x 1012 compounds in potential chemical space 5250 compounds generated from 10 runs 5 inhibitors superimposed on top of each other fashioned the butterfly shape Ligands with 50% of atoms in both wings count as butterfly-like

HIV 1 RT Results 4/26 known inhibitors found in butterfly like shape in ADAPT library Effavirenz (SustivaTM),Pyrrolobenzodiazepinone, PETT, Dyarryl Sulfone like scaffolds were found among the butterfly like compounds. Despite the lack of a structural motif in the initial, unseeded populations, the ADAPT program was able to reproduce a geometric constraint, the ‘butterfly’ motif of known NNI’s from the use of a molecular docking fitness function which is not a best choice

Questions • What are the time gains/ costs in using this technique instead of just some screening technique? • How do you decide what to set the parameters to ? • How do you test the method / parameter set without a known set of ligands to form the fragment library from?

Genetic Algorithm for de novo Design

Genetic Algorithm for de novo Design

Presentation Transcript

Protein Structure Alignment using a Genetic algorithm

Genetic Algorithm

Genetic Algorithm

Genetic Algorithm

Genetic Algorithm

Genetic Algorithm

A Genetic Algorithm

A Genetic Algorithm-Based Approach to Content-Based Image Retrieval

De novo Parallel Assemblers Algorithm discussion

GENETIC ALGORITHM

Genetic Algorithm

Genetic Algorithm

Genetic Algorithm

Genetic Algorithm

Genetic algorithm

A genetic algorithm-based method for feature subset selection

A Genetic Algorithm for Designing Materials:

GENETIC ALGORITHM

The Prerequisites for Genetic Analysis--De Novo Sequencing

A Genetic Algorithm-Based Approach to Content-Based Image Retrieval

Genetic Algorithm

Genetic Algorithm