- 83 Views
- Uploaded on
- Presentation posted in: General

Prediction of NMR Chemical Shifts. A Chemometrical Approach

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Prediction of NMR Chemical Shifts. A Chemometrical Approach

К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg

Advanced Chemistry Development (ACD)

Spectra

Structure

- In many cases we obtain several structures corresponding to spectral data.
- In this case we need a method to rank the structures.
- Most powerful method - compare experimental and predicted 13C NMR spectra

Experimental

Predicted

2,00

9.62

- In most cases predicted spectrum of “correct structure” has best fit to experimental spectrum
- In practice “correct structure” has average deviation between predicted and experimental spectra 2-3 ppm

- Real-world task. Unknown structure with MF C29H32N2O5 and spectral data (1D and 2D NMR).
- 20 min to generate all structures (> 12 000)
- 24 hoursto predict the NMR 13С spectraof all the obtained structures
- Speed of spectra prediction should be increased

– extremely slow

- Quantum Mechanics
- Database approach
- HOSE Codes
- Maximum Common Substructure

- Rule-based
- Additive scheme
- Neural Networks

– accurate but slow

– fast but inaccurate

- Our choice – improve accuracy of fast method

0.52

-1.85

-2.79

d = åaixi

-1.35

153.71

144.31

-1.39

0.52

-4.49

1.43

d =

153.71

-1.85-4.49-1.39

-2.79+1.43+0.52+0.52

-1.35

=144.31

Main problem – find correct values of atom increments

- We have database of 1.5 millions of chemical shifts for 13С.
- We can try to obtain correct values!

…

Atom’s type

CH3

CH2

CH2

CH

C

O

…

2

1

1

1

1

Number of atoms

1

1st sphere

2nd sphere

Input variables

Atom environment encoding

Chemical shifts

X

Y

Samples

- Initially best scheme of structure representation does not evident
- We should find scheme which has best accuracy
- We should optimize
- substitutents coding scheme
- number of used “spheres”

- 210 K of chemical shifts used as a training set.
- 170 K of chemical shifts from recent literature used as external validation set.

“Central” atom

7 (N)

- Atom type (C, O, etc.).
- Hybridization (sp3, sp2, etc).
- Valence
- Number of neighbor H.
- Charge
- Distance to “central” atom (bonds)

1 (sp3)

3

2

0

3

“Substitutent”

- Best possible average deviation is 3.5 ppm.
- We need less than 3 ppm (2 is preferable).
- Should we use additional variables?
- We should be very careful adding variables.

125,38

134,16

138,30

125,90

141,48

+11,26

+2,48

122,90

136.64

127.86

145.42

D-1.94

D+1.34

D-3.94

…

Atom pair type

CH2 and CH

CandO

…

1

1

Number of pairs

Atoms

Pairs of atoms (Crosses)

Input variables

Mean error, ppm

Distance

between atoms within a cross

Number of spheres

- Now accuracy is good enough (2.3 ppm)
- But it is still bad in some cases
- Unfortunately these cases are very important
- This “special” cases should be taken into account

- We use “topological” distance
- Sometimes equal topological distance correspond to different “real” distances

25.7

3,9 A

17.6

2,9 A

“Stereo” effects

Atoms

Pairs of atoms (Crosses)

Variables

- We have 1.5 millions of chemical shifts
- We should try to use all available data
- Only one problem – matrix size
- In many cases matrix size becomes more than 2 GB

Faster by 3 order!

C29H32N2O5

- Combination of “new” method with old well-known algorithm can produce very good (and unexpected) result