Prediction of nmr chemical shifts a chemometrical approach
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Prediction of NMR Chemical Shifts. A Chemometrical Approach PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on
  • Presentation posted in: General

Prediction of NMR Chemical Shifts. A Chemometrical Approach. К.А. Blinov , Y . D . Smurnyy , Т. S . Churanova , М.Е. Elyashberg Advanced Chemistry Development (ACD). Structure and its spectral data. Spectra. Structure. Sometimes solution is not obvious.

Download Presentation

Prediction of NMR Chemical Shifts. A Chemometrical Approach

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Prediction of nmr chemical shifts a chemometrical approach

Prediction of NMR Chemical Shifts. A Chemometrical Approach

К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg

Advanced Chemistry Development (ACD)


Structure and its spectral data

Structure and its spectral data

Spectra

Structure


Sometimes solution is not obvious

Sometimes solution is not obvious

  • In many cases we obtain several structures corresponding to spectral data.

  • In this case we need a method to rank the structures.

  • Most powerful method - compare experimental and predicted 13C NMR spectra


13 c nmr spectral data

13C NMR spectral data

Experimental

Predicted

2,00

9.62


How to find the best structure

How to find the best structure?

  • In most cases predicted spectrum of “correct structure” has best fit to experimental spectrum

  • In practice “correct structure” has average deviation between predicted and experimental spectra 2-3 ppm


The role of the spectra prediction

The role of the spectra prediction

  • Real-world task. Unknown structure with MF C29H32N2O5 and spectral data (1D and 2D NMR).

  • 20 min to generate all structures (> 12 000)

  • 24 hoursto predict the NMR 13С spectraof all the obtained structures

  • Speed of spectra prediction should be increased


Methods of the prediction of nmr spectra

Methods of the prediction ofNMR spectra

– extremely slow

  • Quantum Mechanics

  • Database approach

    • HOSE Codes

    • Maximum Common Substructure

  • Rule-based

    • Additive scheme

    • Neural Networks

– accurate but slow

– fast but inaccurate

  • Our choice – improve accuracy of fast method


Additive scheme

Additive scheme

0.52

-1.85

-2.79

d = åaixi

-1.35

153.71

144.31

-1.39

0.52

-4.49

1.43

d =

153.71

-1.85-4.49-1.39

-2.79+1.43+0.52+0.52

-1.35

=144.31

Main problem – find correct values of atom increments


Available data

Available data

  • We have database of 1.5 millions of chemical shifts for 13С.

  • We can try to obtain correct values!


How to encode atom environment

How to encode atom environment

Atom’s type

CH3

CH2

CH2

CH

C

O

2

1

1

1

1

Number of atoms

1

1st sphere

2nd sphere

Input variables


Data for pls regression

Data for PLS regression

Atom environment encoding

Chemical shifts

X

Y

Samples


Find best structure encoding

Find best structure encoding

  • Initially best scheme of structure representation does not evident

  • We should find scheme which has best accuracy

  • We should optimize

    • substitutents coding scheme

    • number of used “spheres”


Used data

Used data

  • 210 K of chemical shifts used as a training set.

  • 170 K of chemical shifts from recent literature used as external validation set.


How to describe atom type

How to describe atom type

“Central” atom

7 (N)

  • Atom type (C, O, etc.).

  • Hybridization (sp3, sp2, etc).

  • Valence

  • Number of neighbor H.

  • Charge

  • Distance to “central” atom (bonds)

1 (sp3)

3

2

0

3

“Substitutent”


Result for different atom encoding

Result fordifferent atom encoding


Result for number of spheres

Result fornumber of spheres


Is it the best possible accuracy

Is it the best possible accuracy?

  • Best possible average deviation is 3.5 ppm.

  • We need less than 3 ppm (2 is preferable).

  • Should we use additional variables?

  • We should be very careful adding variables.


Substitutents interference cross effect

125,38

134,16

138,30

125,90

141,48

Substitutents interference (cross effect)

+11,26

+2,48

122,90

136.64

127.86

145.42

D-1.94

D+1.34

D-3.94


Enhanced structure encoding

Enhanced structure encoding

Atom pair type

CH2 and CH

CandO

1

1

Number of pairs

Atoms

Pairs of atoms (Crosses)

Input variables


Result for atom pairs crosses

Result foratom pairs (crosses)

Mean error, ppm

Distance

between atoms within a cross

Number of spheres


More enhancements

More enhancements?

  • Now accuracy is good enough (2.3 ppm)

  • But it is still bad in some cases

  • Unfortunately these cases are very important

  • This “special” cases should be taken into account


Stereo effects double bonds

Stereo effects: double bonds

  • We use “topological” distance

  • Sometimes equal topological distance correspond to different “real” distances

25.7

3,9 A

17.6

2,9 A


Modified structure encoding

Modified structure encoding

“Stereo” effects

Atoms

Pairs of atoms (Crosses)

Variables


Prediction of spectra by different methods mean error ppm

Prediction of spectra by different methods (mean error, ppm)


Size of training set

Size of training set

  • We have 1.5 millions of chemical shifts

  • We should try to use all available data

  • Only one problem – matrix size

  • In many cases matrix size becomes more than 2 GB


Bigger dataset smaller mean error

Bigger dataset – smaller mean error!


The final results

The final results

Faster by 3 order!


Prediction time the past and present

Prediction time: the past and present

C29H32N2O5


Conclusions

Conclusions

  • Combination of “new” method with old well-known algorithm can produce very good (and unexpected) result


  • Login