1 / 89

Evolutionary and Agent-based Search / Exploration in Chemical Library and De Novo Design

Evolutionary and Agent-based Search / Exploration in Chemical Library and De Novo Design Ian Parmee Advanced ComputationalTechnologies (ACT) and Bristol UWE. Early design characterised by: human-centric concept formulation and development ;

matteo
Download Presentation

Evolutionary and Agent-based Search / Exploration in Chemical Library and De Novo Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolutionary and Agent-based Search / Exploration in Chemical Library and De Novo Design Ian Parmee Advanced ComputationalTechnologies (ACT) and Bristol UWE

  2. Early design characterised by: • human-centric concept formulation and development ; • uncertainty due to lack of data / information / knowledge and poor problem definition; • correspondingly low-fidelity computational representation (if any, initially); • Design activity extends across multiple domains and disciplines. • Current CAD characterised by: • low-level, inflexible user interaction; • need for high product definition; • high-fidelity design representation; • CAD is domain-specific – does not exploit cross-domain experience.

  3. Interdisciplinary knowledge and expertise can help us understand highly complex, multi-layered generic relationships inherent within decision-making processes • Degree of human-based subjective evaluation required during early design and decision-making process where uncertainty and associated risk are prime characteristics. • INHERENTLY PEOPLE-CENTRED • PROCESSES -

  4. Fundamentally human-based activities but small increases in problem complexity result in numbers of possible alternatives rapidly moving beyond our cognitive capabilities. • Advanced generic computational environments required that support comparative assessment whilst user maintains an integral, significant role. • Such systems may, for instance, capture and utilise human knowledge and experience to better define machine-based problem representation. • Over-automation, human exclusion and associated loss of valuable and essential information must be avoided

  5. Interactive Intelligent Systems for Design • Major potential – one aspect - use of EC algorithms as gatherers of optimal / high-quality design information • Info can be collated and integrated with human-based decision-making processes. • Approach can capture designer experiential knowledge and intuition within further evolutionary search – knowledge discovery. • Supports exploration outside of initial constraint, objective and variable bounds

  6. Iterative designer/machine-based refinement of design space. Provide succinct graphical representation of complex relationships from various perspectives. Immersive system? - designer part of iterative loop

  7. Preliminary military airframe – BAE Systems • Characterised by uncertain requirements and fuzzy objectives • Long gestation periods between initial design brief and realisation of product. • Changes in operational requirements + technological advances • Demand for responsive, highly flexible strategy - design change / compromise inherent features.

  8. Cluster-oriented Genetic Algorithms COGAs identify high performance regions of complex preliminary / conceptual design spaces Approach can be utilised to generate highly relevant design information relating to single, multi-objective and constrained problem domains

  9. Cluster Oriented genetic Algorithm • Initially built for multimodal continuous spaces • Stochastic Evolutionary approach • Inspired by identifying multiple high performing regions • Adaptive filtering for continuous extraction of high performing regions FCS

  10. Good solution set cover of identified regions supports extraction of relevant design information Information mined, processed and presented to the designer in succinct graphics . Info relates to: Solution robustness, revision of variable ranges, conversion from variable to fixed parameters, degree of objective conflict, sensitivity of objectives to each variable Solutions describing HP regions can be projected onto any 2D variable hyperplane:

  11. Projection of COGA single and multi-objective output on 2D variable hyperplanes ( data from nine variable problem) Single objective Multiple objectives Not feasible to search through all 2D hyperplanes – single graphic required.

  12. Combination of Box Plot representation and Parallel Co-ordinates relating to all objectives contains several layers of design information Developed Parallel Co-ordinate Box Plot –PCBP [Parmee and Johnson, 2004] provides all information in single graphic:

  13. PCBP of solution distribution of each objective across each variable

  14. Utilising PCBP Information • Using information available within the PCBP designer can: • Identify variables least affecting solution performance across full set of objectives (i.e. variables where full axes relating to each objective overlap e.g. 1, 2, 3, 6, & 9). • ii) Further identify minimum objective conflict i.e. where box plots relating to each objective largely overlap • iii) Identify conflicting objectives - evident from diverse distribution of box plots along some axes

  15. iv) View related variable hyperplane projections for a different perspective of spatial distribution of objectives’ high-performance regions Access to such hyperplanes driven by simple clicking operations on selected variable axes v) View projections of high-performance regions on objective space – direct mapping between variable and objective space

  16. Projection of ATR / FR regions on objective space

  17. vi) View approximate Pareto frontiers generated from the non-dominated sorting of HP region solutions Distribution of solutions for objective ATR1 and FR against SPEA-II Pareto front Distribution of solutions for objective ATR1 and SEP1 against SPEA-II Pareto front.

  18. Approximate Pareto frontiers generated through non-dominated solution sorting within the objectives’ HP regions Pareto approximations are all that are required during conceptual design COGA potentially offers more information than standard Pareto based methods

  19. Relaxing the COGA adaptive filter allows lower performance solutions into the HP regions and ‘closes the gap’ in the approximate Ferry Range / Specific Excess Power Pareto front – also results in mutually inclusive region between all three objectives

  20. Off-line analysis of search data supports iterative designer/machine-based refinement of design space. • Immersive system? - designer part of iterative loop • Effect upon emerging solutions identified during iterative development of design space.

  21. Transferring this Technology into Drug Design Processes • Drug design involves synthesis of small subset of compounds (focussed libraries) from many possibilities assembled from available reagents • In collaboration with Evotec OAI We have developed COGA approaches to aid the selection of compounds for synthesis using in silico models for specific drug characteristics

  22. Motivation • Accelerate the process of finding potential drug candidate molecules (leads) • Finding such leads will improve hit rate during actual assaying (HTS) • A desktop tool will facilitate knowledge discovery through mining of high quality information related to i) complex reagent interaction & ii) Objective sensitivities

  23. Initial Project Aims • To assess utility of evolutionary engineering design techniques and strategies within a drug design environment. • To appropriately modify and develop those techniques and strategies offering best potential. • To integrate developed techniques with multiparameter medicinal chemistry optimisation.

  24. Methodology • Cluster Oriented Genetic Algorithm (Search and exploratory tool) • Experimentations with SIMILARITY, QSAR-SOL, QSAR-LOGP, Docking etc • Constraint Handling • Preference Based Multi-Objective approach

  25. Experimental Set Up • Target molecule CN(Cc1cnc2nc(N)nc(N)c2n1)c3ccc(cc3)C(=O)NC(CCC(O)=O)C(O)=O • Aromatic Acid (R1) + Primary Amine (R2)  Amino Acids * Independent variables = 2 * Search Space size = 400 x 400

  26. Now necessary to search across extensive, highly discrete spaces described by reagent combinations. COGA concepts have been introduced - major modification in terms of basic representation and genetic operators that further promote exploration. Range of in-silico objective functions have been utilised (e.g. similarity, QSAR, docking etc)

  27. Two deliverables required from the developed software: i) Identification of individual best-performing reagent combinations across range of objectives and in terms of a pre-defined target – Optimisation. ii) Identification of those reagents that offer best potential in terms of a range of objectives and a pre-defined target leading to development of focussed libraries - Search and Exploration

  28. Generation of Focussed Libraries Development of additional search heuristics to ensure that COGA identifies as many high-performance reactants as possible in a robust manner whilst minimising the number of evaluations required. Test set comprising fully enumerated 400x400 Library (Primary Amines + Aromatic Acids) used as benchmark in studies. 160,000 possible molecules. Fitness function = Tanimoto Similarity

  29. Characteristic Landscape SIMILARITY (400 x 400) …(top 0.05% solutions) Contour of top 0.05% solutions landscape

  30. Top 0.05% solutions in 400x400 library identified by exhaustive search and enumeration – emergence of high performance (HP) axes relating to particular reactants

  31. Plot of FCS solutions of a typical COGA search of the test library. On average COGA identified ~200 solutions out of the best 800 (top 0.05% )

  32. Developed heuristics to improve exploratory capabilities include various tabu lists that monitor number of visits to high perfomance axis and reassign solutions when cover of a particular high performance axis is considered adequate in terms of number and distribution of solutions. Various strategies for the replacement of ‘tabu’ solutions to promote further sampling / exploration of the reagent space. Extensive experimentation.

  33. Results from fifty runs of COGA with developed heuristics – very significant increase in number of HP axes and top end solutions identified. Relatively robust performance – significant improvement.

  34. Increasing Dimension: Moving away from initial test set - COGA (with heuristics) applied to 3 reagent library comprising primary amines, aromatic acids and aldehydes. Total size = 400 x 400 x 400 (64 x 106) reactant combinations. Focussed library approach again using chemical similarity against methotraxate as a criteria was introduced.

  35. COGA output for identifying high-performance Reagents for inclusion in focussed libraries (3 Reagents):

  36. Constraint Satisfaction • Number of constraints have been included: • molecular weight • hydrogen bond donors • hydrogen bond acceptors • rotational bonds • reaction energy • polar surface area etc) via • Initial focussed libraries comprising feasible solutions can be generated by introducing standard EC constraint handling techniques and further filtering out of non-desirables during the COGA processing

  37. Multi-objective Satisfaction • Transferring COGA multi-objective approaches from engineering application to drug design has resulted in a capability to generate focussed libraries of solutions that best satisfy criteria relating to: • Similarity • QSAR (solubility) • QSAR (log p) • Docking • Fuzzy preference components have been integrated to facilitate user-interaction

  38. Results from 50 runs of COGA on each objective

  39. Results from 50 runs of COGA on each objective

  40. Results from 50 runs of COGA on each objective

  41. Projection of COGA output onto objective space

  42. User Preferences • Similarity QSAR (LOGP) Preference Questions = > >> < << 2. QSAR (LOGP) QSAR (SOL) 3.Similarity QSAR (LOGP)

  43. Similar linguistic fuzzy preferences (Fodor and Reubens) previously used in BAE Airframe Design work (Cvetkovic & Parmee) Library focussing preference selection directly affects the settings of the COGA Adaptive Filter

  44. User- Preferences SIM ~ QSAR SIM >> QSAR • A significant overlap of solutions signify presence of common axes clusters • Preference change improves this overlap

  45. Denovo Design Other collaborative work involves more direct evolutionary de novo molecule design utilising classified known chemical reactions along with a database of available reagents as mutation operators. Non-evolutionary programs tend to add and remove atoms or fragments to get a better fitness i.e. they grow molecules in a simulated environment. GA approaches tend to use splicing approach i.e. “ripping” two molecules apart, swapping fragments over and forcing them back together.

  46. Current approaches unnatural – bench chemist modifies molecules via a reaction carried out in a flask – appropriate simulation required. Evotec OAI has a Corporate Chemical Database (CCD) of reactions to enumerate virtual combinatorial libraries. Developed Evolutionary Programming (EP) approach utilisescombination of this CCD and an internal database of commercially available compounds (Evotec Supplier Database (ESD)) Similarity, QSAR or Docking criteria provide fitness evaluation functions Integration of EP results in evolution of high-performance de novo molecule designs.

  47. Molecule Mutations / Modifications Fitness-proportionate selection (Roulette Wheel) provides a population member to be mutated Appropriate reaction selected from CCD and applied to population member Mutation types – addition, cleavage and transformation Fitness of mutated compound then calculated and individual then placed in appropriate position within a fitness-ranked population

  48. Main difficulties relate to: • unacceptable growth of molecules - can be overcome by control of frequency of mutation type and introduction of penalty functions. • complex nature of search space – discontinuous search space with possible disjoint regions. Some known solutions difficult to reach • extensive tuning did not greatly improve the search and exploration

More Related