Prediction of nmr chemical shifts a chemometrical approach
Download
1 / 29

Prediction of NMR Chemical Shifts. A Chemometrical Approach - PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on

Prediction of NMR Chemical Shifts. A Chemometrical Approach. К.А. Blinov , Y . D . Smurnyy , Т. S . Churanova , М.Е. Elyashberg Advanced Chemistry Development (ACD). Structure and its spectral data. Spectra. Structure. Sometimes solution is not obvious.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Prediction of NMR Chemical Shifts. A Chemometrical Approach' - maddy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Prediction of nmr chemical shifts a chemometrical approach

Prediction of NMR Chemical Shifts. A Chemometrical Approach

К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg

Advanced Chemistry Development (ACD)



Sometimes solution is not obvious
Sometimes solution is not obvious

  • In many cases we obtain several structures corresponding to spectral data.

  • In this case we need a method to rank the structures.

  • Most powerful method - compare experimental and predicted 13C NMR spectra


13 c nmr spectral data
13C NMR spectral data

Experimental

Predicted

2,00

9.62


How to find the best structure
How to find the best structure?

  • In most cases predicted spectrum of “correct structure” has best fit to experimental spectrum

  • In practice “correct structure” has average deviation between predicted and experimental spectra 2-3 ppm


The role of the spectra prediction
The role of the spectra prediction

  • Real-world task. Unknown structure with MF C29H32N2O5 and spectral data (1D and 2D NMR).

  • 20 min to generate all structures (> 12 000)

  • 24 hoursto predict the NMR 13С spectraof all the obtained structures

  • Speed of spectra prediction should be increased


Methods of the prediction of nmr spectra
Methods of the prediction ofNMR spectra

– extremely slow

  • Quantum Mechanics

  • Database approach

    • HOSE Codes

    • Maximum Common Substructure

  • Rule-based

    • Additive scheme

    • Neural Networks

– accurate but slow

– fast but inaccurate

  • Our choice – improve accuracy of fast method


Additive scheme
Additive scheme

0.52

-1.85

-2.79

d = åaixi

-1.35

153.71

144.31

-1.39

0.52

-4.49

1.43

d =

153.71

-1.85-4.49-1.39

-2.79+1.43+0.52+0.52

-1.35

=144.31

Main problem – find correct values of atom increments


Available data
Available data

  • We have database of 1.5 millions of chemical shifts for 13С.

  • We can try to obtain correct values!


How to encode atom environment
How to encode atom environment

Atom’s type

CH3

CH2

CH2

CH

C

O

2

1

1

1

1

Number of atoms

1

1st sphere

2nd sphere

Input variables


Data for pls regression
Data for PLS regression

Atom environment encoding

Chemical shifts

X

Y

Samples


Find best structure encoding
Find best structure encoding

  • Initially best scheme of structure representation does not evident

  • We should find scheme which has best accuracy

  • We should optimize

    • substitutents coding scheme

    • number of used “spheres”


Used data
Used data

  • 210 K of chemical shifts used as a training set.

  • 170 K of chemical shifts from recent literature used as external validation set.


How to describe atom type
How to describe atom type

“Central” atom

7 (N)

  • Atom type (C, O, etc.).

  • Hybridization (sp3, sp2, etc).

  • Valence

  • Number of neighbor H.

  • Charge

  • Distance to “central” atom (bonds)

1 (sp3)

3

2

0

3

“Substitutent”


Result for different atom encoding
Result fordifferent atom encoding


Result for number of spheres
Result fornumber of spheres


Is it the best possible accuracy
Is it the best possible accuracy?

  • Best possible average deviation is 3.5 ppm.

  • We need less than 3 ppm (2 is preferable).

  • Should we use additional variables?

  • We should be very careful adding variables.


Substitutents interference cross effect

125,38

134,16

138,30

125,90

141,48

Substitutents interference (cross effect)

+11,26

+2,48

122,90

136.64

127.86

145.42

D-1.94

D+1.34

D-3.94


Enhanced structure encoding
Enhanced structure encoding

Atom pair type

CH2 and CH

CandO

1

1

Number of pairs

Atoms

Pairs of atoms (Crosses)

Input variables


Result for atom pairs crosses
Result foratom pairs (crosses)

Mean error, ppm

Distance

between atoms within a cross

Number of spheres


More enhancements
More enhancements?

  • Now accuracy is good enough (2.3 ppm)

  • But it is still bad in some cases

  • Unfortunately these cases are very important

  • This “special” cases should be taken into account


Stereo effects double bonds
Stereo effects: double bonds

  • We use “topological” distance

  • Sometimes equal topological distance correspond to different “real” distances

25.7

3,9 A

17.6

2,9 A


Modified structure encoding
Modified structure encoding

“Stereo” effects

Atoms

Pairs of atoms (Crosses)

Variables



Size of training set
Size of training set

  • We have 1.5 millions of chemical shifts

  • We should try to use all available data

  • Only one problem – matrix size

  • In many cases matrix size becomes more than 2 GB



The final results
The final results

Faster by 3 order!


Prediction time the past and present
Prediction time: the past and present

C29H32N2O5


Conclusions
Conclusions

  • Combination of “new” method with old well-known algorithm can produce very good (and unexpected) result


ad