A multiobjective approach to combinatorial library design
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

A Multiobjective Approach to Combinatorial Library Design PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on
  • Presentation posted in: General

A Multiobjective Approach to Combinatorial Library Design. Val Gillet University of Sheffield, UK. Outline. SELECT GA based program for combinatorial library design Combinatorial subset selection in product-space Multiobjective optimisation via weighted-sum fitness function

Download Presentation

A Multiobjective Approach to Combinatorial Library Design

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A multiobjective approach to combinatorial library design

A Multiobjective Approach to Combinatorial Library Design

Val Gillet

University of Sheffield, UK


Outline

Outline

  • SELECT

    • GA based program for combinatorial library design

    • Combinatorial subset selection in product-space

    • Multiobjective optimisation via weighted-sum fitness function

  • Limitations of a weighted-sum approach

  • MoSELECT

    • Multiobjective optimisation via MOGA


Library design is a multiobjective optimisation problem

Library Design is a Multiobjective Optimisation Problem

  • Early HTS results disappointing

    • Low hit rates

    • Hits too lipophilic; too flexible; high molecular weights…

  • Diverse libraries

    • Distance-based/cell-based diversity

    • Bioavailability; cost; ease of synthesis…

  • Focused/targeted libraries

    • Similarity to known active; predicted active by QSAR model; fit to receptor site

    • Bioavailability; cost,….


Product based library design

Product-Based Library Design

  • A two-component combinatorial library can be represented by a 2D array

  • A combinatorial subset can be defined by intersecting rows and columns of the array

  • Exploring all combinatorial subsets is equivalent to testing all permutations of the rows and columns of the array


Selecting combinatorial subsets using a ga

R1

R2

6 ´4 subset

11

8

2

30

7

25

10

1

19

18

Selecting Combinatorial Subsets Using a GA

  • Chromosome encoding

    • each chromosome represents a combinatorial subset as an integer string

    • one partition for each reactant pool

    • the size of a partition equals the no. of reactants required from the corresponding pool

  • Crossover, mutation and roulette wheel parent selection are used to evolve new potential solutions


Multiobjective optimisation in select

Multiobjective Optimisation in SELECT

  • Weighted-sum fitness function

    • enumerate the combinatorial library represented by a chromosome

    • calculate descriptors for molecules in the library

  • Objectives are scaled and user defined weights are applied


Multiobjective optimisation in select cont

Multiobjective Optimisation in SELECT cont.

  • Diversity indices

    • distance-based (e.g. sum of pairwise dissimilarities and Daylight fingerprints)

    • cell-based

  • Physical property terms

    • minimise the difference between the distribution in the library and some reference distribution, e.g.

      • “drug-like” profile derived from WDI

  • Cost: £

    • minimise the cost of the library


Library enumeration in select

Library Enumeration in SELECT

  • Virtual library is enumerated upfront

    • ADEPT (A Daylight Enumeration and Profiling Tool)

    • Identify potential reactants

    • Filter out unwanted ones

    • Enumerate virtual library

      • Reaction Tookit (Reaction transforms; MTZ language)

  • Descriptors are calculated upfront

  • Combinatorial subset accessed via fast lookup


Example amide library

10K virtual library

100 amines ´ 100 carboxylic acids

30 x 30 amide subsets

WDI – World Drugs Index

Reactant-based selection: diversity (Diversity 0.564 )

Product-based

Reactant-based

  • Product-based selection: diversity & molecular weight profile (Diversity 0.573)

Example: Amide Library

25

WDI

20

15

Percentage of Compounds

10

5

0

0

200

400

600

800

Molecular weight


Limitations of a weighted sum fitness function

Limitations of a Weighted-Sum Fitness Function

  • Definition of fitness function difficult especially for different types of objectives

    • e.g. molecular weight profile and cost

  • Setting of weights is non-intuitive

  • Can result in regions of search space being obscured especially when objectives are in competition

  • Difficult to monitor progress since >1 objective to follow simultaneously

  • A single solution is found


Varying weights in select

Varying Weights in SELECT

  • Objectives are in competition resulting in trade-offs

  • A family of alternative solutions exist that are all equivalent


Multiobjective optimisation

Multiobjective Optimisation

  • Evolutionary algorithms, e.g., GAs

    • operate with a population of individuals

    • well suited to search for multiple solutions in parallel

    • readily adapted to deal with multiobjective optimisation

  • MOGA: MultiObjective Genetic Algorithm

    • Fonseca & Fleming. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 28(1), 1998, 26-37.


A multiobjective approach to combinatorial library design

MOGA

  • Multiple objectives are handled independently without summation and without weights

  • A hyper-surface is mapped out in the search space

    • represents a continuum of solutions where all solutions are seen as equivalent

    • represents compromises or trade-offs between the various objectives

    • solutions are called non-dominated, or Pareto solutions.

  • A family of non-dominated solutions is sought rather than a single solution


Dominance pareto ranking

0

0

2

4

0

0

0

0

1

  • Pareto ranking: an individual’s rank corresponds to the number of individuals in the current population by which it is dominated

0

0

0

0

Dominance & Pareto Ranking

  • A non-dominatedindividual is one where an improvement in one objective results in a deterioration in one or more of the other objectives when compared with the other individuals in the population

f2

A

B

f1


Select

SELECT

MoSELECT*

Initialise Population

Initialise Population

Select parents

Select parents

Apply genetic operators

Apply genetic operators

Calculate objectives: a,b,c...

Calculate objectives: a,b,c...

Calculate dominance: a, b,c

Apply fitness function

f=w1a + w2b + w3c + ...

Rank using Pareto Ranking:

based on dominance

Rank based on fitness

Test for convergence

Test for convergence

Family of solutions

Single solution

* Patent Applied for


Moselect search progress

0 iterations

100 iterations

1000 iterations

5000 iterations

MoSELECT: Search Progress


Family of solutions

0.574

0.578

0.582

Diversity

0.586

0.59

0.594

0.58

0.6

0.62

0.64

D

MW

Family of Solutions

  • Each run of MoSELECT results in a family of solutions

  • Finding the same coverage of solutions using SELECT would require multiple runs using various combinations of weights

  • One run of MoSELECT takes the same cpu time as one run of SELECT

5000iterations


Focused library aminothiazoles

Focused Library: Aminothiazoles

  • a-bromoketones & thioureas extracted from ACD

  • ADEPT used to

    • filter reactants (MW < 300; RB < 8)

    • enumerate virtual library => 12850 products (74 a-bromoketones & 170 thioureas)

  • MoSELECT used to design 15×30 subsets optimised on

    • Similarity to a target compound (Daylight fingerprints)

    • Cost ($/g)


Moselect solutions 1

5000 iterations

MoSELECT Solutions: 1

0 iterations


Moselect solutions 2

Running MoSELECT

with niching

MoSELECT Solutions: 2

5000 iterations


Moving to 2 objectives parallel graph representation

Moving to > 2 Objectives:Parallel Graph Representation

5000 iterations

0.578

0.582

Diversity

0.586

0.59

0.594

0.58

0.6

0.62

0.64

D

MW

Each objective is scaled using the Max and Min values achieved when the objective is optimised independently


Focused library amides

Focused Library: Amides

  • 100 × 100 virtual library

  • MoSELECT used to design 10 × 10 subsets

  • Objectives

    • Similarity to a target

      • Sum of similarities using Daylight fps

    • Predicted bioavailability

      • Each compound rated from 1 to 4

      • Sum of ratings

    • Hydrogen bond profile

    • Rotatable bond profile


Moselect solutions

MoSELECT Solutions

  • Population size 50

  • Iteration 5000

  • Niching 30%

  • Number of solutions = 11

  • CPU 53s (R12K 360 MHz)


Conclusions

Conclusions

  • Advantages of MoSELECT

    • a family of equivalent solutions is obtained in a single run with each solution representing one combinatorial library

    • this is achieved at vastly reduced computational cost compared to performing multiple runs of SELECT

    • no need to determine weights for objectives

    • optimisation of different types of objectives is readily achieved

    • visualisation of the search progress allows trade-offs between objectives to be observed

    • the user can make an informed choice on which solution(s) to explore


Acknowledgements

Acknowledgements

  • Illy Khatib, Peter Willett; Information Studies, University of Sheffield

  • Peter Fleming; Automatic Control and Systems Engineering, University of Sheffield

  • Darren Green, Andrew Leach; GlaxoSmithKline, UK

  • Funding by GlaxoSmithKline, UK

  • John Bradshaw; Daylight

  • Daylight for software support


  • Login