a multiobjective approach to combinatorial library design
Download
Skip this Video
Download Presentation
A Multiobjective Approach to Combinatorial Library Design

Loading in 2 Seconds...

play fullscreen
1 / 25

A Multiobjective Approach to Combinatorial Library Design - PowerPoint PPT Presentation


  • 141 Views
  • Uploaded on

A Multiobjective Approach to Combinatorial Library Design. Val Gillet University of Sheffield, UK. Outline. SELECT GA based program for combinatorial library design Combinatorial subset selection in product-space Multiobjective optimisation via weighted-sum fitness function

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A Multiobjective Approach to Combinatorial Library Design' - kelii


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • SELECT
    • GA based program for combinatorial library design
    • Combinatorial subset selection in product-space
    • Multiobjective optimisation via weighted-sum fitness function
  • Limitations of a weighted-sum approach
  • MoSELECT
    • Multiobjective optimisation via MOGA
library design is a multiobjective optimisation problem
Library Design is a Multiobjective Optimisation Problem
  • Early HTS results disappointing
    • Low hit rates
    • Hits too lipophilic; too flexible; high molecular weights…
  • Diverse libraries
    • Distance-based/cell-based diversity
    • Bioavailability; cost; ease of synthesis…
  • Focused/targeted libraries
    • Similarity to known active; predicted active by QSAR model; fit to receptor site
    • Bioavailability; cost,….
product based library design
Product-Based Library Design
  • A two-component combinatorial library can be represented by a 2D array
  • A combinatorial subset can be defined by intersecting rows and columns of the array
  • Exploring all combinatorial subsets is equivalent to testing all permutations of the rows and columns of the array
selecting combinatorial subsets using a ga

R1

R2

6 ´4 subset

11

8

2

30

7

25

10

1

19

18

Selecting Combinatorial Subsets Using a GA
  • Chromosome encoding
    • each chromosome represents a combinatorial subset as an integer string
    • one partition for each reactant pool
    • the size of a partition equals the no. of reactants required from the corresponding pool
  • Crossover, mutation and roulette wheel parent selection are used to evolve new potential solutions
multiobjective optimisation in select
Multiobjective Optimisation in SELECT
  • Weighted-sum fitness function
    • enumerate the combinatorial library represented by a chromosome
    • calculate descriptors for molecules in the library
  • Objectives are scaled and user defined weights are applied
multiobjective optimisation in select cont
Multiobjective Optimisation in SELECT cont.
  • Diversity indices
    • distance-based (e.g. sum of pairwise dissimilarities and Daylight fingerprints)
    • cell-based
  • Physical property terms
    • minimise the difference between the distribution in the library and some reference distribution, e.g.
      • “drug-like” profile derived from WDI
  • Cost: £
    • minimise the cost of the library
library enumeration in select
Library Enumeration in SELECT
  • Virtual library is enumerated upfront
    • ADEPT (A Daylight Enumeration and Profiling Tool)
    • Identify potential reactants
    • Filter out unwanted ones
    • Enumerate virtual library
      • Reaction Tookit (Reaction transforms; MTZ language)
  • Descriptors are calculated upfront
  • Combinatorial subset accessed via fast lookup
example amide library
10K virtual library

100 amines ´ 100 carboxylic acids

30 x 30 amide subsets

WDI – World Drugs Index

Reactant-based selection: diversity (Diversity 0.564 )

Product-based

Reactant-based

  • Product-based selection: diversity & molecular weight profile (Diversity 0.573)
Example: Amide Library

25

WDI

20

15

Percentage of Compounds

10

5

0

0

200

400

600

800

Molecular weight

limitations of a weighted sum fitness function
Limitations of a Weighted-Sum Fitness Function
  • Definition of fitness function difficult especially for different types of objectives
    • e.g. molecular weight profile and cost
  • Setting of weights is non-intuitive
  • Can result in regions of search space being obscured especially when objectives are in competition
  • Difficult to monitor progress since >1 objective to follow simultaneously
  • A single solution is found
varying weights in select
Varying Weights in SELECT
  • Objectives are in competition resulting in trade-offs
  • A family of alternative solutions exist that are all equivalent
multiobjective optimisation
Multiobjective Optimisation
  • Evolutionary algorithms, e.g., GAs
    • operate with a population of individuals
    • well suited to search for multiple solutions in parallel
    • readily adapted to deal with multiobjective optimisation
  • MOGA: MultiObjective Genetic Algorithm
    • Fonseca & Fleming. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 28(1), 1998, 26-37.
slide13
MOGA
  • Multiple objectives are handled independently without summation and without weights
  • A hyper-surface is mapped out in the search space
    • represents a continuum of solutions where all solutions are seen as equivalent
    • represents compromises or trade-offs between the various objectives
    • solutions are called non-dominated, or Pareto solutions.
  • A family of non-dominated solutions is sought rather than a single solution
dominance pareto ranking

0

0

2

4

0

0

0

0

1

  • Pareto ranking: an individual’s rank corresponds to the number of individuals in the current population by which it is dominated

0

0

0

0

Dominance & Pareto Ranking
  • A non-dominatedindividual is one where an improvement in one objective results in a deterioration in one or more of the other objectives when compared with the other individuals in the population

f2

A

B

f1

select
SELECT

MoSELECT*

Initialise Population

Initialise Population

Select parents

Select parents

Apply genetic operators

Apply genetic operators

Calculate objectives: a,b,c...

Calculate objectives: a,b,c...

Calculate dominance: a, b,c

Apply fitness function

f=w1a + w2b + w3c + ...

Rank using Pareto Ranking:

based on dominance

Rank based on fitness

Test for convergence

Test for convergence

Family of solutions

Single solution

* Patent Applied for

moselect search progress

0 iterations

100 iterations

1000 iterations

5000 iterations

MoSELECT: Search Progress
family of solutions

0.574

0.578

0.582

Diversity

0.586

0.59

0.594

0.58

0.6

0.62

0.64

D

MW

Family of Solutions
  • Each run of MoSELECT results in a family of solutions
  • Finding the same coverage of solutions using SELECT would require multiple runs using various combinations of weights
  • One run of MoSELECT takes the same cpu time as one run of SELECT

5000iterations

focused library aminothiazoles
Focused Library: Aminothiazoles
  • a-bromoketones & thioureas extracted from ACD
  • ADEPT used to
    • filter reactants (MW < 300; RB < 8)
    • enumerate virtual library => 12850 products (74 a-bromoketones & 170 thioureas)
  • MoSELECT used to design 15×30 subsets optimised on
    • Similarity to a target compound (Daylight fingerprints)
    • Cost ($/g)
moselect solutions 2

Running MoSELECT

with niching

MoSELECT Solutions: 2

5000 iterations

moving to 2 objectives parallel graph representation
Moving to > 2 Objectives:Parallel Graph Representation

5000 iterations

0.578

0.582

Diversity

0.586

0.59

0.594

0.58

0.6

0.62

0.64

D

MW

Each objective is scaled using the Max and Min values achieved when the objective is optimised independently

focused library amides
Focused Library: Amides
  • 100 × 100 virtual library
  • MoSELECT used to design 10 × 10 subsets
  • Objectives
    • Similarity to a target
      • Sum of similarities using Daylight fps
    • Predicted bioavailability
      • Each compound rated from 1 to 4
      • Sum of ratings
    • Hydrogen bond profile
    • Rotatable bond profile
moselect solutions
MoSELECT Solutions
  • Population size 50
  • Iteration 5000
  • Niching 30%
  • Number of solutions = 11
  • CPU 53s (R12K 360 MHz)
conclusions
Conclusions
  • Advantages of MoSELECT
    • a family of equivalent solutions is obtained in a single run with each solution representing one combinatorial library
    • this is achieved at vastly reduced computational cost compared to performing multiple runs of SELECT
    • no need to determine weights for objectives
    • optimisation of different types of objectives is readily achieved
    • visualisation of the search progress allows trade-offs between objectives to be observed
    • the user can make an informed choice on which solution(s) to explore
acknowledgements
Acknowledgements
  • Illy Khatib, Peter Willett; Information Studies, University of Sheffield
  • Peter Fleming; Automatic Control and Systems Engineering, University of Sheffield
  • Darren Green, Andrew Leach; GlaxoSmithKline, UK
  • Funding by GlaxoSmithKline, UK
  • John Bradshaw; Daylight
  • Daylight for software support
ad