Chemical descriptors and molecular graphs l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 51

Chemical descriptors and molecular graphs PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on
  • Presentation posted in: General

Problems and approaches in computational chemistry. Chemical descriptors and molecular graphs. Alessandra Roncaglioni - IRFMN. [email protected] Outline. Descriptors definition Structure  Descriptors Descriptors classification (bi- or tri- dimensional) Pros & Cons

Download Presentation

Chemical descriptors and molecular graphs

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chemical descriptors and molecular graphs l.jpg

Problems and approaches in computational chemistry

Chemical descriptors and molecular graphs

Alessandra Roncaglioni - IRFMN

[email protected]


Outline l.jpg

Outline

  • Descriptors definition

  • Structure  Descriptors

  • Descriptors classification (bi- or tri- dimensional)

  • Pros & Cons

  • Overview of common descriptor classes (mainly 2D)

  • Applications

  • Sw resources

  • Further reading

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Introduction l.jpg

Introduction

  • Molecular descriptors are numerical values that characterize properties of molecules

  • Examples:

    • Physicochemical properties (empirical)

    • Values from algorithms, such as 2D fingerprints

  • Vary in complexity of encoded information and in compute time

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Theoretical descriptors l.jpg

Theoretical descriptors

“A molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment”

www.moleculardescriptors.eu

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Desiderable descriptors characteristics l.jpg

Desiderable descriptors characteristics

  • Invariance with respect to labelling and numbering of the molecule atoms

  • Invariance with respect to the molecule roto-translation

  • An unambiguous computable definition

  • Values in a suitable numerical range

  • allowing structural interpretation

  • no trivial correlation with other molecular descriptors

  • gradual change in its values with gradual changes in the molecular structure

  • widely applicable

  • preferably, allowing reversible decoding (back from the descriptor value to the structure)

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Outline6 l.jpg

Outline

  • Descriptors definition

  • Structure  Descriptors

  • Descriptors classification (bi- or tri- dimensional)

  • Pros & Cons

  • Overview of common descriptor classes (mainly 2D)

  • Applications

  • Sw resources

  • Further reading

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


From chemical compounds to descriptors l.jpg

From chemical compounds to descriptors

CAS RN. 145131-25-5

N-(2,6-Bis(1-methylethyl)phenyl)-N'-((1-(1-methyl-1H-indol-3-yl)cyclohexyl)methyl)urea

CC(C)C1=CC=CC(C(C)C)=C1NC(=O)NCC2(CCCCC2)C3=CN(C)C4=C3C=CC=C4

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Descriptors classification l.jpg

Descriptors classification

Depending on the structural dimensionality:

  • Up to 2D (0D-2D)

    Derived from the atomic composition and connectivity of molecules

  • 3D

    Encodingforenergetic and spatial information

  • Molecular interaction fields (MIF)

    Encodingforelectrostatic and stericvariation

COMPLEXITY

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


2d descriptors i l.jpg

2D Descriptors (I)

  • Many groups accounting for different characteristics

  • May requires explicit H (check file format)

  • Fast to be calculated (almost all expert systems rely on 2D descriptors)

  • More reproducible (do not require 3D structure)

    but ...

  • Might be focused on local contribution neglecting intramolecular interactions

  • Ignore conformational flexibility

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


2d descriptors ii l.jpg

2D Descriptors (II)

but ...

  • Ignore stereo configuration

  • Not invariants to tautomerism

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


3d descriptors i l.jpg

3D Descriptors (I)

  • Invarainttoroto-traslationalchanging

  • Theyrequireconformationalsearch

  • Followedby QM/MMoptimization

Sampling

Minimize

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


3d descriptors ii l.jpg

3D Descriptors (II)

  • More complete and realistic description of relevant molecular characteristics

  • Can discriminate among isomers and provide hints to select the most stable tautomer

    but ...

  • Computationally more demanding

  • Involve stochastic steps: non deterministic result

  • Results depend upon the QM/MM theory used for the optimization

  • Referencestructure: minimum conformation in vacuumnotnecessairlybeing the bioactiveone

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Mif i l.jpg

MIF (I)

  • Requires 3D conformationalligned in the Euclideanspace

  • Relatesvariation in the fieldwithvariation in the activity (3D-QSAR)

St1 St2 … Stm El1 El2 … Elm

Mol 1 … ………………………

Mol 2 … ………………………

… ……………………………

… ……………………………………………………………………………………………

… ……………………………

Mol n … ………………………

Mol 1

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Slide14 l.jpg

MIF (II)

Probes:

N3+ sp3 Amine NH3 cation N2+ sp3 Amine NH2 cation

N2: sp3 NH2 with lone pair N2= sp2 Amine NH2 cation

N2 Neutral flat NH2 eg amide N1+ sp3 Amine NH cation

N1: sp3 NH with lone pair N1= sp2 Amine NH cation

N1 Neutral flat NH eg amide NH= sp2 NH with lone pair

N1# sp NH with one hydrogen N: sp3 N with lone pair

N:= sp2 N with lone pair N:# sp N with lone pair

N-: Anionic tetrazole N NM3 Trimethyl-ammonium cation

O sp2 carbonyl oxygen O:: sp2 Carboxy oxygen atom

O- sp2 phenolate oxygen O= O of SO4 or sulfonamide

OH Phenol or carboxy OH O1 Alkyl hydroxy OH group

OC2 Ether oxygen OES sp3 ester oxygen atom

ON Oxygen of nitro group OS O of sulfone / sulfoxide

OH2 Water OFU Furan oxygen atom

C3 Methyl CH3 group C1= sp2 CH aromatic or vinyl

.... ............ .... ............

BOTH The amphipathic Probe DRY The hydrophobic Probe

Countur map

Green = steric +; Yellow = steric -; Red = charge -; Blue = charge +

Steric interaction (van der Waals energy calculated by Lennard-Jones function)

Electrostatic interaction (calculated by coulombian type function)

... ... ...

Hydrogenbondingenergy

Solvationenergy

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Mif iii l.jpg

MIF (III)

  • More biologically plausible (receptor interactions)

  • Identifies areas responsible for the variation of the activity

    but …

  • Very sensitive to conformation selection and to the chosen alignment

  • Proper selection of force fields

  • Large number of grid point cotribution

  • QSAR modelling complexity

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Outline16 l.jpg

Outline

  • Descriptors definition

  • Structure  Descriptors

  • Descriptors classification (bi- or tri- dimensional)

  • Pros & Cons

  • Overview of common descriptor classes (mainly 2D)

  • Applications

  • Sw resources

  • Further reading

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Types of descriptors l.jpg

Typesofdescriptors

  • Constitutional descriptors

  • Topological descriptors (topological indexes, connectivity indexes, information contents)

  • Atom centred fragments

  • Functional groups

  • Fingerprints

  • Electrostatic descriptors(*) (charge descriptors)

  • Geometric descriptors*

  • Physico-chemical properties

  • Quantum- chemicaldescriptors*

  • Thermodynamicdescriptors(*)

  • Pharmacophores

  • WHIM & GETAWAY*

  • BCUT (or Burdeneigenvalues)

  • Autocorrelationdescriptors

  • EVA descriptors*

* 3D descriptors

17

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Constitutional descriptors l.jpg

Constitutional descriptors

  • The most simple and commonly used descriptors

  • Reflecting the molecular composition of a compound without any information about its molecular geometry

  • Examples

    • Molecular weight

    • Count of atoms and bonds

    • Count of rings

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Molecular graph l.jpg

Molecular graph

  • A molecular graph or chemical graph is a representation of the structural formula of a chemical compound in terms of graph theory.

  • It’s a very convenient and natural way of representing the relationships between objects: objects are represented by vertexes and the relationship between them by edges.

.

.

.

.

.

.

.

.

.

Vertex

Edge

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Topological descriptors l.jpg

Topological descriptors

  • Calculated from the 2D graph of the molecule on the basis of connection tables or closely-related formats

    • e.g. the distance matrix

      • an N x N table showing the distance (in bonds) between each pair of atoms

  • Obtained by operations on the distance matrices and whose values are independent of vertex numbering or labelling (graph invariants)

  • Characterize structures according to size, degree of branching, and overall shape, symmetry and cycling

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Connection table l.jpg

Connection table

1 O1 2 1

2 C 0 1 1 3 2 4 1

3 O 0 2 2

4 C 1 2 1 5 1 6 1

5 N2 4 1

6 C2 4 1 7 1

7 C0 6 1 8 2 12 1

8 C 1 7 2 9 1

9 C1 8 1 10 2

10 C 0 9 2 11 1 13 1

11 C 1 10 1 12 2

12 C 1 11 2 7 1

13 O 1 10 1

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Distance matrix l.jpg

Distance matrix

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Wiener index l.jpg

Wiener index

  • Counts the number of bonds between pairs of atoms and sums the distances between all pairs

  • Add up all the off-diagonal elements and divide by 2 (because matrix is symmetrical)

W = 268

23

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Molecular connectivity indexes l.jpg

Molecular connectivity indexes

  • A whole series of indexes, developed by Kier & Hall in the late ‘70s, following earlier work by Randić

  • Identify all possible subgraphs of different sizes in the molecule

  • Size of subgraph determines the order of the index

    • 0 bond subgraph gives a zero order index

    • 1-bond subgraph gives a 1st order index

    • 2-bond subgraph gives a 2nd order index

    • 3-bond subgraph gives a 3rd order index

    • ...

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Randi index l.jpg

Randić index

  • Calculated from a the H-depleted molecular graph where each vertex is weighted by the vertex degree, i.e. the number of connected non-hydrogen atoms

  • Example:

1

3

.577

2

3

9

6

.333

3

.577

2

.707

3

.408

1

1

3

.577

1

valence at vertexes

bond values as products of vertex valence

edge terms as reciprocal of squared root of bond values

Randić index = sum of edge terms = 3.179

Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


Kier hall indexes l.jpg

Kier & Hall indexes

  • Chi indexes introduces valence values to encode sigma, pi, and lone pair electrons

    δi and δj (i ≠ j) = values of the atomic connectivity

  • Atomic connectivity δi is calculated by:

    Zi = tot nr electrons in the i-th atom

    Zi υ = nr of valence electrons

    Hi = nr H attached to the i-th atom

  • 26

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Kier shape indexes l.jpg

    Kier Shape Indexes

    • Characterize aspects of molecular shape

      • Compare the molecule with the “extreme shapes” possible for that number of atoms

    • Based on the number of atoms (N) and the number of bonds (P) in the graph:

      • 1 = N (N-1)2 / P2

      •  2 = (N-1) (N-2)2 / P2

      •  3 = (N-1) (N-3)2 / P2 (if N is odd)

      •  3 = (N-3) (N-2)2 / P2 (if N is even)

    • alpha-modified kappa indexes can be generated taking into account the sizes of atoms, relative to C sp3 atom

    • A molecular flexibility index is derived from these

       = 1 2/ N

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Information content indexes l.jpg

    Information content indexes

    • Defined on the basis of the Shannon information theory

      ni= nr of atoms in the i-th class

      n= tot nr of atoms in the molecule

    • Classes are determined by the coordination sphere taken into account, leading to indexes of different order k.

    • Other information content indices:

    SIC - structural IC

    CIC - complementary IC

    BIC - bonding IC

    q = nr of edges

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Considerations about topological descriptors l.jpg

    Considerations about topological descriptors

    • Frequently used, easily calculated

    • It is often difficult to disclose the chemical meaning of highest order indexes

    • Topological indexes effectively encode the same information as fingerprint fragments

      • in a less obvious way

      • but can be processed numerically

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Atom centred fragments functional groups l.jpg

    Atom centred fragments & functional groups

    • Number of specific atom types in a molecule calculated by knowing the molecular composition and atom connectivities

    • Number of specific functional groups in a molecule, calculated by knowing the molecular composition and atom connectivities

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    2d fingerprints l.jpg

    2D Fingerprints

    • Two types:

      • One based on a fragment dictionary

        • Each bit position corresponds to a specific substructure fragment

        • Fragments that occur infrequently may be more useful

      • Another based on hashed methods

        • Not dependent on a pre-defined dictionary

        • Any fragment can be encoded

    • Originally designed for substructure searching, not for molecular descriptors

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Fragment dictionaries l.jpg

    Fragment dictionaries

    000101000101000100000000011010100110101000000101000000001000

    000101000101000100000000011010100110101000000001000000001000

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Pharmacophores l.jpg

    Pharmacophores

    • Used in drug design

    • Based on atoms or substructures thought to be relevant for receptor binding: specification of the spatial arrangement of a small number of atoms or functional groups

    • Typically include H bond donors and acceptors, charged centers, aromatic ring centers and hydrophobic centers

    • With the model in hand, search databases for molecules that fit this spatial environment

    • Might be 3D

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Creating a pharmacophore l.jpg

    Creating a Pharmacophore

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Physico chemical properties l.jpg

    Physico-chemical Properties

    • Will hear about them during QSPR lesson

    • The key descriptor widespread in QSAR is hydrophobicity

      • LogP – the logarithm of the partition coefficient between n-octanol and water

      • LogD – correct LogP on the basis of the dissociated fraction of the compound

    • Experimentally assessed with shaker flask or reversed phase HPLC

    • It is often useful to be able to calculate a physico-chemical property for a compound from its structure

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    L ogp calculation l.jpg

    LogPcalculation

    • Many methods have been proposed for calculating a good estimate for LogP

  • Fragment-based methods (ClogP)

    • pioneered by Corwin Hansch and Al Leo (Pomona College)

    • identify large fragments, whose contribution to logP value is known from their occurrence in other compounds with measured logP

    • large “training set” of compounds with accurately-measured logP (the “Starlist”)

    • works very well if test compound has the right fragments

      • problems arise if test compound contains fragments that are “missing” from the training set

  • 36

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    L ogp calculation37 l.jpg

    LogPcalculation

    • Atom-based methods (AlogP, XlogP, SlogP)

    • pioneered by Gordon Crippen (Univ. Michigan)

    • based on identifying a series of “atom types” in the molecule

      • essentially, small atom-centred fragments

      • usually 60-200 such fragments are involved

    • each atom-type is assigned a numerical value

    • logP is obtained by adding values for the atom types present in the test molecule

    • atom-type values are obtained by regression analysis, based on a set of compounds with measured logP

    • sometimes some extra correction factors are used too

    37

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Summary l.jpg

    Summary

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano

    Rognan D., British Journal of Pharmacology (2007) 152, 38–52


    Outline39 l.jpg

    Outline

    • Descriptors definition

    • Structure  Descriptors

    • Descriptors classification (bi- or tri- dimensional)

    • Pros & Cons

    • Overview of common descriptor classes (mainly 2D)

    • Applications

    • Sw resources

    • Further reading

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Quantitative structure activity relationships l.jpg

    Quantitative Structure-Activity Relationships

    • Tomorrow …

    • Lessons 4&5

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Chemoinformatics l.jpg

    Chemoinformatics

    • Molecular database management

    • Reverse engineering

    • Chemical similarity assessment

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Molecular similarity l.jpg

    Molecular similarity

    • The descriptors of a molecule can be considered a vector of attributes (properties).

    • The attributes may be real number (continuous variables) or they may be binary in nature (binary variables).

    For binary variables

    For continuous variables

    Tanimotosimilaritycoefficient

    (Range 0 to 1)

    (Range -.333 to +1)

    Hodgkinindex

    (Range –1 to +1)

    (Range 0 to 1)

    Euclideandistance

    (Range 0 to N)

    (Range 0 to )

    a numnber of bits on for A

    b numnber of bits on for B

    c numnber of bits on for A AND B

    X are vectors

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Drug design l.jpg

    Drug design

    • Hightroughput virtual screening

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Outline44 l.jpg

    Outline

    • Descriptors definition

    • Structure  Descriptors

    • Descriptors classification (bi- or tri- dimensional)

    • Pros & Cons

    • Overview of common descriptor classes (mainly 2D)

    • Applications

    • Sw resources

    • Further reading

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Software resources l.jpg

    Software resources

    • Db of calculated descriptors

      • MOLE db http://michem.disat.unimib.it/mole_db/

    • Commercial sw

      • CODESSA, Dragon, MDL, TSAR, ....

    • Free sw

      • Virtual Computational Chemistry Laboratory www.vvclab.org

      • MODEL - MolecularDescriptorLabhttp://jing.cz3.nus.edu.sg/cgi-bin/model/model.cgi

    • Open source sw/libraries

      • Chemistry Development Kit (CDK)

        http://almost.cubic.uni-koeln.de/cdk/cdk_top

      • Linux4Chemistry http://www.redbrick.dcu.ie/~noel/linux4chemistry/

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Further reading l.jpg

    Further reading

    • Web

      • www.moleculardescriptors.eu

    • Book

      • “Handbook of Molecular Descriptors”. Roberto Todeschini and Viviana Consonni, Wiley-VCH, 2000.

    • Papers

      • Estrada,E., Molina,E. and Perdomo-López,I. (2001). Can 3D Structural Parameters Be Predicted from 2D (Topological) Molecular Descriptors? J.Chem.Inf.Comput.Sci., 41, 1015-1021.

      • Katritzky,A.R. and Gordeeva,E.V. (1993). Traditional Topological Indices vs Electronic, Geometrical, and Combined Molecular Descriptors in QSAR/QSPR Research. J.Chem.Inf.Comput.Sci., 33, 835-857.

      • Randic,M. (1990). The Nature of the Chemical Structure. J.Math.Chem., 4, 157-184.

      • Tetko,I.V. (2003). The WWW as a Tool to Obtain Molecular Parameters. Mini Reviews in Medicinal Chemistry, 3, 809-820.

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Concluding remarks l.jpg

    Concluding remarks

    • Depending on the application define the preferred complexity level for chemical description

    • Avoid to use meaningless numbers: all descriptor types have advantages and limitations but easily interpretable descriptors might be preferred

    • Examples

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Tautomers i l.jpg

    Tautomers (I)

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    Tautomers ii l.jpg

    Tautomers (II)

    Predicted values for logBCF model

    Lipophilicitydescriptorvariation

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    3d descriptors variability i l.jpg

    3D descriptorsvariability (I)

    LUMO energy

    Intra Lab.

    Inter Lab. (PM3)

    Inter Lab. (AM1)

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


    3d descriptors variability ii l.jpg

    3D descriptorsvariability (II)

    Dipole moment

    Intra Lab.

    Inter Lab. (PM3)

    Lab 1

    Lab 2

    Lab 3

    Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano


  • Login