- 100 Views
- Uploaded on
- Presentation posted in: General

A Multiobjective Approach to Combinatorial Library Design

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

A Multiobjective Approach to Combinatorial Library Design

Val Gillet

University of Sheffield, UK

- SELECT
- GA based program for combinatorial library design
- Combinatorial subset selection in product-space
- Multiobjective optimisation via weighted-sum fitness function

- Limitations of a weighted-sum approach
- MoSELECT
- Multiobjective optimisation via MOGA

- Early HTS results disappointing
- Low hit rates
- Hits too lipophilic; too flexible; high molecular weights…

- Diverse libraries
- Distance-based/cell-based diversity
- Bioavailability; cost; ease of synthesis…

- Focused/targeted libraries
- Similarity to known active; predicted active by QSAR model; fit to receptor site
- Bioavailability; cost,….

- A two-component combinatorial library can be represented by a 2D array
- A combinatorial subset can be defined by intersecting rows and columns of the array
- Exploring all combinatorial subsets is equivalent to testing all permutations of the rows and columns of the array

R1

R2

6 ´4 subset

11

8

2

30

7

25

10

1

19

18

- Chromosome encoding
- each chromosome represents a combinatorial subset as an integer string
- one partition for each reactant pool
- the size of a partition equals the no. of reactants required from the corresponding pool

- Crossover, mutation and roulette wheel parent selection are used to evolve new potential solutions

- Weighted-sum fitness function
- enumerate the combinatorial library represented by a chromosome
- calculate descriptors for molecules in the library

- Objectives are scaled and user defined weights are applied

- Diversity indices
- distance-based (e.g. sum of pairwise dissimilarities and Daylight fingerprints)
- cell-based

- Physical property terms
- minimise the difference between the distribution in the library and some reference distribution, e.g.
- “drug-like” profile derived from WDI

- minimise the difference between the distribution in the library and some reference distribution, e.g.
- Cost: £
- minimise the cost of the library

- Virtual library is enumerated upfront
- ADEPT (A Daylight Enumeration and Profiling Tool)
- Identify potential reactants
- Filter out unwanted ones
- Enumerate virtual library
- Reaction Tookit (Reaction transforms; MTZ language)

- Descriptors are calculated upfront
- Combinatorial subset accessed via fast lookup

10K virtual library

100 amines ´ 100 carboxylic acids

30 x 30 amide subsets

WDI – World Drugs Index

Reactant-based selection: diversity (Diversity 0.564 )

Product-based

Reactant-based

- Product-based selection: diversity & molecular weight profile (Diversity 0.573)

25

WDI

20

15

Percentage of Compounds

10

5

0

0

200

400

600

800

Molecular weight

- Definition of fitness function difficult especially for different types of objectives
- e.g. molecular weight profile and cost

- Setting of weights is non-intuitive
- Can result in regions of search space being obscured especially when objectives are in competition
- Difficult to monitor progress since >1 objective to follow simultaneously
- A single solution is found

- Objectives are in competition resulting in trade-offs
- A family of alternative solutions exist that are all equivalent

- Evolutionary algorithms, e.g., GAs
- operate with a population of individuals
- well suited to search for multiple solutions in parallel
- readily adapted to deal with multiobjective optimisation

- MOGA: MultiObjective Genetic Algorithm
- Fonseca & Fleming. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 28(1), 1998, 26-37.

- Multiple objectives are handled independently without summation and without weights
- A hyper-surface is mapped out in the search space
- represents a continuum of solutions where all solutions are seen as equivalent
- represents compromises or trade-offs between the various objectives
- solutions are called non-dominated, or Pareto solutions.

- A family of non-dominated solutions is sought rather than a single solution

0

0

2

4

0

0

0

0

1

- Pareto ranking: an individual’s rank corresponds to the number of individuals in the current population by which it is dominated

0

0

0

0

- A non-dominatedindividual is one where an improvement in one objective results in a deterioration in one or more of the other objectives when compared with the other individuals in the population

f2

A

B

f1

MoSELECT*

Initialise Population

Initialise Population

Select parents

Select parents

Apply genetic operators

Apply genetic operators

Calculate objectives: a,b,c...

Calculate objectives: a,b,c...

Calculate dominance: a, b,c

Apply fitness function

f=w1a + w2b + w3c + ...

Rank using Pareto Ranking:

based on dominance

Rank based on fitness

Test for convergence

Test for convergence

Family of solutions

Single solution

* Patent Applied for

0 iterations

100 iterations

1000 iterations

5000 iterations

0.574

0.578

0.582

Diversity

0.586

0.59

0.594

0.58

0.6

0.62

0.64

D

MW

- Each run of MoSELECT results in a family of solutions
- Finding the same coverage of solutions using SELECT would require multiple runs using various combinations of weights
- One run of MoSELECT takes the same cpu time as one run of SELECT

5000iterations

- a-bromoketones & thioureas extracted from ACD
- ADEPT used to
- filter reactants (MW < 300; RB < 8)
- enumerate virtual library => 12850 products (74 a-bromoketones & 170 thioureas)

- MoSELECT used to design 15×30 subsets optimised on
- Similarity to a target compound (Daylight fingerprints)
- Cost ($/g)

5000 iterations

0 iterations

Running MoSELECT

with niching

5000 iterations

5000 iterations

0.578

0.582

Diversity

0.586

0.59

0.594

0.58

0.6

0.62

0.64

D

MW

Each objective is scaled using the Max and Min values achieved when the objective is optimised independently

- 100 × 100 virtual library
- MoSELECT used to design 10 × 10 subsets
- Objectives
- Similarity to a target
- Sum of similarities using Daylight fps

- Predicted bioavailability
- Each compound rated from 1 to 4
- Sum of ratings

- Hydrogen bond profile
- Rotatable bond profile

- Similarity to a target

- Population size 50
- Iteration 5000
- Niching 30%
- Number of solutions = 11
- CPU 53s (R12K 360 MHz)

- Advantages of MoSELECT
- a family of equivalent solutions is obtained in a single run with each solution representing one combinatorial library
- this is achieved at vastly reduced computational cost compared to performing multiple runs of SELECT
- no need to determine weights for objectives
- optimisation of different types of objectives is readily achieved
- visualisation of the search progress allows trade-offs between objectives to be observed
- the user can make an informed choice on which solution(s) to explore

- Illy Khatib, Peter Willett; Information Studies, University of Sheffield
- Peter Fleming; Automatic Control and Systems Engineering, University of Sheffield
- Darren Green, Andrew Leach; GlaxoSmithKline, UK
- Funding by GlaxoSmithKline, UK
- John Bradshaw; Daylight
- Daylight for software support