slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri D PowerPoint Presentation
Download Presentation
Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri D

Loading in 2 Seconds...

play fullscreen
1 / 68

Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri D - PowerPoint PPT Presentation


  • 323 Views
  • Uploaded on

Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano (Italy) Website: michem.unimib.it/chm/. Milano Chemometrics and QSAR Research Group. Roberto Todeschini Viviana Consonni

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri D' - erika


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Department of Environmental Sciences

University of Milano - Bicocca

P.za della Scienza, 1 - 20126 Milano (Italy)

Website: michem.unimib.it/chm/

Milano Chemometrics and QSAR Research Group

Roberto Todeschini

Viviana Consonni

Manuela Pavan

Andrea Mauri

Davide Ballabio

Alberto Manganaro

chemometrics

molecular descriptors

QSAR

multicriteria decision making

environmetrics

experimental design

artificial neural networks

statistical process control

slide2

Roberto Todeschini

Milano Chemometrics and QSAR Research Group

Molecular descriptors

Constitutional descriptors and graph invariants

Iran - February 2009

slide3

Content

  • Counting descriptors
  • Empirical descriptors
  • Fragment descriptors
  • Molecular graphs
  • Topological descriptors
slide4

Counting descriptors

  • Each descriptor represents the number of elements of some defined chemical quantity.
  • For example:
  • the number of atoms or bonds
  • the number of carbon or chlorine atoms
  • - the number of OH or C=O functional groups
  • - the number of benzene rings
  • - the number of defined molecular fragments
slide5

Counting descriptors

... also a sum of some atomic / bond property is considered as a count descriptor, as well as its average

  • For example:
  • molecular weight and average molecular weight
  • sum of the atomic electronegativities
  • sum of the atomic polarizabilities
  • sum of the bond orders
slide6

Counting descriptors

A counting descriptor n is semi-positive variable,

i.e. n 0

Its statistical distribution is usually a Poisson distribution.

  • Main characteristics
  • simple
  • the most used
  • local information
  • high degeneracy
  • discriminant modelling power
slide7

Empirical descriptors

Descriptors based on specific structural aspects present in sets of congeneric compounds and usually not applicable (or giving a single default value) to compounds of different classes.

slide8

H

H

Cl

CH3

H

H

Empirical descriptors

Index of Taillander

Taillander et al., 1983

It is a descriptor dedicated to the modelling of the benzene rings and is defined as the sum of the six lengths joining the adjacent substituent groups.

slide9

Empirical descriptors

Hydrophilicity index (Hy)

Todeschini et al., 1999

It is a descriptor dedicated to the modelling of hydrophilicity and is based on a function of the counting of hydrophilic groups (OH-, SH-, NH-, ...) and carbon atoms.

nHy number of hydrophilic groups

nC number of carbon atoms

n total number of non-hydrogen atoms

-1  Hy  3.64

slide10

Empirical descriptors

Compound nHy nC n Hy

hydrogen peroxide 2 0 2 3.64

carbonic acid 2 1 3 3.48

water 2 0 1 3.44

butanetetraol 4 4 8 3.30

propanetriol 3 3 6 2.54

ethanediol 2 2 4 1.84

methanol 1 1 2 1.40

ethanol 1 2 3 0.71

decanediol 2 10 12 0.52

propanol 1 3 4 0.37

butanol 1 4 5 0.17

pentanol 1 5 6 0.03

methane 0 1 1 0.00

nHy = 0 and nC = 0 0 0 N 0.00

decanol 1 10 11 - 0.28

ethane 0 2 2 - 0.63

pentane 0 5 5 - 0.90

decane 0 10 10 - 0.96

alcane with nC = 1000 0 1000 1000 - 1.00

slide11

Fragment approach

  • Parametric approach (Hammett – Hansch,1964)
  • Substituent approach (Free-Wilson, Fujita-Ban, 1976)
  • DARC-PELCO approach (Dubois, 1966)
  • Sterimol approach (Verloop, 1976)
slide12

Congenericity principle

QSAR styrategies can be applied ONLY to classes of similar compounds

Fragment approach

The biological activity of a molecule is

the sum of its fragment properties

common reference skeleton

molecule properties gradually modified by substituents

slide13

Lipophilic properties

Electronic properties

Steric properties

Other molecular properties

1

2

3

4

Hansch approach

Corvin Hansch, 1964

Biological response = f1(L) + f2(E) + f3(S) + f4(M)

slide14

Hansch approach

1

Congenericity approach

Linear additive scheme

2

3

Limited representation of global molecular properties

No 3D and conformational information

4

slide15

1

2

Free-Wilson approach

slide17

F Br I F Br I

Pos. 1

Pos. 2

Free-Wilson approach

Free-Wilson, 1964

Iks absence/presence of k-th subst. in the s-th site

slide18

presence of a fragment

absence of a fragment

Fragment approach

Fingerprints

binary vector

1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

similarity searching

slide19

5

6

1

2

3

4

7

Molecular graph

slide20

5

6

1

2

3

4

atoms

bonds

7

Molecular graph

Mathematical object defined as

G = (V, E)

set Vvertices

set E edges

slide21

Molecular graph

Usually in the molecular graph hydrogen atoms are not considered

H - depleted molecular graph

slide22

v1 v2 v3 v2 v5

walk of length 4

v1 v2 v3 v4 v5

path of length 4

Molecular graph

A walk in G is a sequence of vertices

w = (v1, v2, v3, ..., vk) such that {vj, vj+1} E.

The length of a walk is the number of edges traversed by the walk.

A path in G is a walk without any repeated vertices.

The length of a path (v1, v2, v3, ..., vk+1) is k.

1

3

2

4

5

6

slide23

Molecular graph

The topological distance dij is the length of the shortest path between the vertices vi and vj.

1

3

2

d15 = 2

d15 = 4

4

5

6

The detour distancedij is the length of the longest path between the vertices vi and vj.

slide24

v1 v2 v3 v2 v1

Self returning walk of length 4

1

3

2

4

5

6

Molecular graph

A self returning walk is a walk closed in itself, i.e. a walk starting and ending on the same vertex.

v2 v3 v4 v5 v2

A cycle is a walk with no repeated vertices other than its first and last ones (v1 = vk).

slide25

DRAGON

MWC1, MWC2, …, MWC10

Molecular graph

The molecular walk (path) count MWCk (MPCk) of order k is the total number of walks (paths) of k-th length in the molecular graph.

MWC0 = nSK (no. of atoms)

MWC1 = nBO (no. of bonds)

  • Molecular size
  • Branching
  • Graph complexity
slide26

DRAGON

spectral moments of the adjacency matrix, i.e. linear combinations of counts of certain fragments contained in the molecular graph, i.e. embedding frequencies.

SRW1, SRW2, …, SRW10

Molecular graph

The self-returning walk count SRWk of order k is the total number of self-returning walks of length k in the graph.

SRW1 = nSK

SRW2 = nBO

slide27

Molecular graph

Local vertex invariants (LOVIs) are quantities associated to each vertex of a molecular graph.

Graph invariants are molecular descriptors representing graph properties that are preserved by isomorphism.

  • characteristic polynomial
  • derived from local vertex invariants
slide28

Topological matrix

Algebraic operator

Local Vertex Invariants

Graph invariants

Molecular descriptors

Molecular graph and more

Molecular graph

slide29

molecular geometry

x, y, z coordinates

graph invariants

topostructural descriptors

topochemical descriptors

topographic descriptors

topological information indices

3D-Wiener index

3D-Balaban index

D/D index

...............

Wiener index, Hosoya Z index

Zagreb indices, Mohar indices

Randic connectivity index

Balaban distance connectivity index

Schultz molecular topological index

Kier shape descriptors

eigenvalues of the adjacency matrix

eigenvalues of the distance matrix

Kirchhoff number

detour index

topological charge indices

...............

Kier-Hall valence connectivity indices

Burden eigenvalues

BCUT descriptors

Kier alpha-modified shape descriptors

2D autocorrelation descriptors

...............

total information content on .....

mean information content on .....

molecular graph

slide30

Molecule graph invariants

  • Numerical chemical information extracted from molecular graphs.
  • The mathematical representation of a molecular graph is made by the topological matrices:
          • adjacency matrix
          • atom connectivity matrix
          • distance matrix
          • edge distance matrix
          • incidence matrix

... more than 60 matrix representations of the molecular structure

slide31

Local vertex invariants

Local vertex invariants (LOVIs) are quantities associated to each vertex of a molecular graph.

  • Examples:
  • atom vertex degree
  • valence vertex degree
  • sum of the vertex distance degree
  • maximum vertex distance degree
slide32

Topological matrices

Adjacency matrix

Derived from a molecular graph, it represents the whole set of connections between adjacent pairs of atoms.

1 if atom i and j are bonded

aij =

0 otherwise

slide33

Topological matrices

Bond number B

It is the simplest graph invariant obtained from the adjacency matrix.

It is the number of bonds in the molecular graph calculated as:

where aij is the entry of the adjacency matrix.

slide34

5

6

1

2

3

4

1

2

3

4

5

6

7

1

1

0

1

0

0

0

0

0

7

2

1

0

1

0

1

0

1

4

3

0

1

0

1

0

1

0

3

4

0

0

1

0

0

0

0

1

5

0

1

0

0

0

0

0

1

6

0

0

1

0

0

0

0

1

7

0

1

0

0

0

0

0

1

Local vertex invariants

atom vertex degree

It is the row sum of the vertex adjacency matrix

slide35

Local vertex invariants

valence vertex degree

for atoms of the 2nd principal quantum number

(C, N, O, F)

number of valence electrons of the i-th atom

number of hydrogens bonded to the i-th atom

slide36

Local vertex invariants

valence vertex degree

the vertex degree of the i-th atom is the count of edges incident with the i-th atom, i.e. the count of  bonds or  electrons.

slide37

Local vertex invariants

valence vertex degree

for atoms with principal quantum number > 2

total number of electrons of the i-th atom

(Atomic Number)

slide38

Topological descriptors

Zagreb indices (Gutman, 1975)

i vertex degree of the i-th atom

slide39

Topological descriptors

Kier-Hall connectivity indices (1986)

They are based on molecular graph decomposition into fragments (subgraphs) of different size and complexity and use atom vertex degrees as subgraph weigth.

Randic branching index (1975)

is called edge connectivity

slide40

Topological descriptors

mean Randic branching index

slide41

Topological descriptors

atom connectivity indices of m-th order

The immediate bonding environment of each atom is encoded by the subgraph weigth.

The number of terms in the sum depends on the molecular structure.

The connectivity indices show a good capability of isomer discrimination and reflect some features of molecular branching.

mP number of m-th order paths

q subgraph type (Path, Cluster, Path/Cluster, Chain)

n = m for Chain (Ring) subgraph type

n = m + 1 otherwise

slide42

Topological descriptors

valence connectivity indices of m-th order

They encode atom identities as well as the connectivities in the molecular graph.

slide43

Topological descriptors

Kier-Hall electronegativity

Kier-Hall relative electronegativity

electronegativity of carbon sp3 taken as zero

principal quantum number

correlation with the Mulliken-Jaffe electronegativity:

slide44

i

5

6

si

1

2

3

4

5

6

7

1

0

1

2

3

2

3

2

13 3

1

2

3

4

2

1

0

1

2

1

2

1

8 2

3

2

1

0

1

2

1

2

9 2

4

3

2

1

0

3

2

3

14 3

7

5

2

1

2

3

0

3

2

13 3

6

3

2

1

2

3

0

3

14 3

13 3

7

2

1

2

3

2

3

0

Distance matrix

vertex distance matrix degree

si It is the row sum of the vertex distance matrix

The distance dij between two vertices is the smallest number of edges between them.

si is high for terminal vertices and low for central vertices

slide45

Local vertex invariants

The eccentricity i of the i-th atom is the upper bound of the distance dij between the atom i and the other atoms j

slide46

Topological descriptors

Petitjean shape index (1992)

A simple shape descriptor

IPJ= 0 for structure strictly cyclic

IPJ= 1 for structure strictly acyclic and with an even diameter

slide47

Topological descriptors

Wiener index (1947)

dij topological distances

high values for big molecules and for linear molecules

low values for small molecules and for branched or cyclic molecules

The Average Wiener index is independent from the molecular size.

slide48

Topological descriptors

Balaban distance connectivity index (1982)

number of atoms

B number of bonds

C number of cycles

si sum of the i-th row distances

average sum of the i-th row distances

one of the most discriminant indices

slide49

5

6

d

e

a

b

c

1

2

3

4

f

atom

Ei

a

b

c

d

e

f

Esi

bond

a

0

1

2

1

2

1

7 2

d

7

b

1

0

1

1

1

1

5 1

e

a

b

c

2

1

0

2

1

2

7 2

d

1

1

2

0

2

1

7 2

c

f

e

2

1

1

2

0

2

8 2

f

1

1

2

1

2

0

7 2

Edge descriptors

slide50

Topographic descriptors

Some geometrical descriptors are derived from the corresponding topological descriptors substituting the topological distances dst by the geometrical distances rst.

They are called topographic descriptors.

For example, the 3D-Wiener index:

slide51

Molecular geometry

The geometry matrix G (or geometric distance matrix) is a square symmetric matrix whose entry rst is the geometric distance calculated as the Euclidean distance between the atoms s and t:

slide52

Department of Environmental Sciences

University of Milano - Bicocca

P.za della Scienza, 1 - 20126 Milano (Italy)

Website: michem.disat.unimib.it/chm/

Milano Chemometrics and QSAR Research Group

Roberto Todeschini

Viviana Consonni

Manuela Pavan

Andrea Mauri

Davide Ballabio

Alberto Manganaro

chemometrics

molecular descriptors

QSAR

multicriteria decision making

environmetrics

experimental design

artificial neural networks

statistical process control

THANK YOU

slide65

lipophilic properties

electronic properties

steric properties

partition coefficients

- logP, logKow

chromatog. param.

- Rf, RT,

Solubility

….

Hammett constants

molar refraction

dipole moment

HOMO, LUMO

Ionization potential

….

molecular weight

VDW volume

molar volume

surface area

….

Hansch approach

Hansch molecular descriptors