A multiobjective approach to combinatorial library design
Download
1 / 25

A Multiobjective Approach to Combinatorial Library Design - PowerPoint PPT Presentation


  • 141 Views
  • Uploaded on

A Multiobjective Approach to Combinatorial Library Design. Val Gillet University of Sheffield, UK. Outline. SELECT GA based program for combinatorial library design Combinatorial subset selection in product-space Multiobjective optimisation via weighted-sum fitness function

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Multiobjective Approach to Combinatorial Library Design' - kelii


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
A multiobjective approach to combinatorial library design

A Multiobjective Approach to Combinatorial Library Design

Val Gillet

University of Sheffield, UK


Outline
Outline

  • SELECT

    • GA based program for combinatorial library design

    • Combinatorial subset selection in product-space

    • Multiobjective optimisation via weighted-sum fitness function

  • Limitations of a weighted-sum approach

  • MoSELECT

    • Multiobjective optimisation via MOGA


Library design is a multiobjective optimisation problem
Library Design is a Multiobjective Optimisation Problem

  • Early HTS results disappointing

    • Low hit rates

    • Hits too lipophilic; too flexible; high molecular weights…

  • Diverse libraries

    • Distance-based/cell-based diversity

    • Bioavailability; cost; ease of synthesis…

  • Focused/targeted libraries

    • Similarity to known active; predicted active by QSAR model; fit to receptor site

    • Bioavailability; cost,….


Product based library design
Product-Based Library Design

  • A two-component combinatorial library can be represented by a 2D array

  • A combinatorial subset can be defined by intersecting rows and columns of the array

  • Exploring all combinatorial subsets is equivalent to testing all permutations of the rows and columns of the array


Selecting combinatorial subsets using a ga

R1

R2

6 ´4 subset

11

8

2

30

7

25

10

1

19

18

Selecting Combinatorial Subsets Using a GA

  • Chromosome encoding

    • each chromosome represents a combinatorial subset as an integer string

    • one partition for each reactant pool

    • the size of a partition equals the no. of reactants required from the corresponding pool

  • Crossover, mutation and roulette wheel parent selection are used to evolve new potential solutions


Multiobjective optimisation in select
Multiobjective Optimisation in SELECT

  • Weighted-sum fitness function

    • enumerate the combinatorial library represented by a chromosome

    • calculate descriptors for molecules in the library

  • Objectives are scaled and user defined weights are applied


Multiobjective optimisation in select cont
Multiobjective Optimisation in SELECT cont.

  • Diversity indices

    • distance-based (e.g. sum of pairwise dissimilarities and Daylight fingerprints)

    • cell-based

  • Physical property terms

    • minimise the difference between the distribution in the library and some reference distribution, e.g.

      • “drug-like” profile derived from WDI

  • Cost: £

    • minimise the cost of the library


Library enumeration in select
Library Enumeration in SELECT

  • Virtual library is enumerated upfront

    • ADEPT (A Daylight Enumeration and Profiling Tool)

    • Identify potential reactants

    • Filter out unwanted ones

    • Enumerate virtual library

      • Reaction Tookit (Reaction transforms; MTZ language)

  • Descriptors are calculated upfront

  • Combinatorial subset accessed via fast lookup


Example amide library

10K virtual library

100 amines ´ 100 carboxylic acids

30 x 30 amide subsets

WDI – World Drugs Index

Reactant-based selection: diversity (Diversity 0.564 )

Product-based

Reactant-based

  • Product-based selection: diversity & molecular weight profile (Diversity 0.573)

Example: Amide Library

25

WDI

20

15

Percentage of Compounds

10

5

0

0

200

400

600

800

Molecular weight


Limitations of a weighted sum fitness function
Limitations of a Weighted-Sum Fitness Function

  • Definition of fitness function difficult especially for different types of objectives

    • e.g. molecular weight profile and cost

  • Setting of weights is non-intuitive

  • Can result in regions of search space being obscured especially when objectives are in competition

  • Difficult to monitor progress since >1 objective to follow simultaneously

  • A single solution is found


Varying weights in select
Varying Weights in SELECT

  • Objectives are in competition resulting in trade-offs

  • A family of alternative solutions exist that are all equivalent


Multiobjective optimisation
Multiobjective Optimisation

  • Evolutionary algorithms, e.g., GAs

    • operate with a population of individuals

    • well suited to search for multiple solutions in parallel

    • readily adapted to deal with multiobjective optimisation

  • MOGA: MultiObjective Genetic Algorithm

    • Fonseca & Fleming. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 28(1), 1998, 26-37.


A multiobjective approach to combinatorial library design
MOGA

  • Multiple objectives are handled independently without summation and without weights

  • A hyper-surface is mapped out in the search space

    • represents a continuum of solutions where all solutions are seen as equivalent

    • represents compromises or trade-offs between the various objectives

    • solutions are called non-dominated, or Pareto solutions.

  • A family of non-dominated solutions is sought rather than a single solution


Dominance pareto ranking

0

0

2

4

0

0

0

0

1

  • Pareto ranking: an individual’s rank corresponds to the number of individuals in the current population by which it is dominated

0

0

0

0

Dominance & Pareto Ranking

  • A non-dominatedindividual is one where an improvement in one objective results in a deterioration in one or more of the other objectives when compared with the other individuals in the population

f2

A

B

f1


Select
SELECT

MoSELECT*

Initialise Population

Initialise Population

Select parents

Select parents

Apply genetic operators

Apply genetic operators

Calculate objectives: a,b,c...

Calculate objectives: a,b,c...

Calculate dominance: a, b,c

Apply fitness function

f=w1a + w2b + w3c + ...

Rank using Pareto Ranking:

based on dominance

Rank based on fitness

Test for convergence

Test for convergence

Family of solutions

Single solution

* Patent Applied for


Moselect search progress

0 iterations

100 iterations

1000 iterations

5000 iterations

MoSELECT: Search Progress


Family of solutions

0.574

0.578

0.582

Diversity

0.586

0.59

0.594

0.58

0.6

0.62

0.64

D

MW

Family of Solutions

  • Each run of MoSELECT results in a family of solutions

  • Finding the same coverage of solutions using SELECT would require multiple runs using various combinations of weights

  • One run of MoSELECT takes the same cpu time as one run of SELECT

5000iterations


Focused library aminothiazoles
Focused Library: Aminothiazoles

  • a-bromoketones & thioureas extracted from ACD

  • ADEPT used to

    • filter reactants (MW < 300; RB < 8)

    • enumerate virtual library => 12850 products (74 a-bromoketones & 170 thioureas)

  • MoSELECT used to design 15×30 subsets optimised on

    • Similarity to a target compound (Daylight fingerprints)

    • Cost ($/g)


Moselect solutions 1

5000 iterations

MoSELECT Solutions: 1

0 iterations


Moselect solutions 2

Running MoSELECT

with niching

MoSELECT Solutions: 2

5000 iterations


Moving to 2 objectives parallel graph representation
Moving to > 2 Objectives:Parallel Graph Representation

5000 iterations

0.578

0.582

Diversity

0.586

0.59

0.594

0.58

0.6

0.62

0.64

D

MW

Each objective is scaled using the Max and Min values achieved when the objective is optimised independently


Focused library amides
Focused Library: Amides

  • 100 × 100 virtual library

  • MoSELECT used to design 10 × 10 subsets

  • Objectives

    • Similarity to a target

      • Sum of similarities using Daylight fps

    • Predicted bioavailability

      • Each compound rated from 1 to 4

      • Sum of ratings

    • Hydrogen bond profile

    • Rotatable bond profile


Moselect solutions
MoSELECT Solutions

  • Population size 50

  • Iteration 5000

  • Niching 30%

  • Number of solutions = 11

  • CPU 53s (R12K 360 MHz)


Conclusions
Conclusions

  • Advantages of MoSELECT

    • a family of equivalent solutions is obtained in a single run with each solution representing one combinatorial library

    • this is achieved at vastly reduced computational cost compared to performing multiple runs of SELECT

    • no need to determine weights for objectives

    • optimisation of different types of objectives is readily achieved

    • visualisation of the search progress allows trade-offs between objectives to be observed

    • the user can make an informed choice on which solution(s) to explore


Acknowledgements
Acknowledgements

  • Illy Khatib, Peter Willett; Information Studies, University of Sheffield

  • Peter Fleming; Automatic Control and Systems Engineering, University of Sheffield

  • Darren Green, Andrew Leach; GlaxoSmithKline, UK

  • Funding by GlaxoSmithKline, UK

  • John Bradshaw; Daylight

  • Daylight for software support