prediction of nmr chemical shifts a chemometrical approach
Download
Skip this Video
Download Presentation
Prediction of NMR Chemical Shifts. A Chemometrical Approach

Loading in 2 Seconds...

play fullscreen
1 / 29

Prediction of NMR Chemical Shifts. A Chemometrical Approach - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on

Prediction of NMR Chemical Shifts. A Chemometrical Approach. К.А. Blinov , Y . D . Smurnyy , Т. S . Churanova , М.Е. Elyashberg Advanced Chemistry Development (ACD). Structure and its spectral data. Spectra. Structure. Sometimes solution is not obvious.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Prediction of NMR Chemical Shifts. A Chemometrical Approach' - maddy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
prediction of nmr chemical shifts a chemometrical approach

Prediction of NMR Chemical Shifts. A Chemometrical Approach

К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg

Advanced Chemistry Development (ACD)

sometimes solution is not obvious
Sometimes solution is not obvious
  • In many cases we obtain several structures corresponding to spectral data.
  • In this case we need a method to rank the structures.
  • Most powerful method - compare experimental and predicted 13C NMR spectra
13 c nmr spectral data
13C NMR spectral data

Experimental

Predicted

2,00

9.62

how to find the best structure
How to find the best structure?
  • In most cases predicted spectrum of “correct structure” has best fit to experimental spectrum
  • In practice “correct structure” has average deviation between predicted and experimental spectra 2-3 ppm
the role of the spectra prediction
The role of the spectra prediction
  • Real-world task. Unknown structure with MF C29H32N2O5 and spectral data (1D and 2D NMR).
  • 20 min to generate all structures (> 12 000)
  • 24 hoursto predict the NMR 13С spectraof all the obtained structures
  • Speed of spectra prediction should be increased
methods of the prediction of nmr spectra
Methods of the prediction ofNMR spectra

– extremely slow

  • Quantum Mechanics
  • Database approach
    • HOSE Codes
    • Maximum Common Substructure
  • Rule-based
    • Additive scheme
    • Neural Networks

– accurate but slow

– fast but inaccurate

  • Our choice – improve accuracy of fast method
additive scheme
Additive scheme

0.52

-1.85

-2.79

d = åaixi

-1.35

153.71

144.31

-1.39

0.52

-4.49

1.43

d =

153.71

-1.85-4.49-1.39

-2.79+1.43+0.52+0.52

-1.35

=144.31

Main problem – find correct values of atom increments

available data
Available data
  • We have database of 1.5 millions of chemical shifts for 13С.
  • We can try to obtain correct values!
how to encode atom environment
How to encode atom environment

Atom’s type

CH3

CH2

CH2

CH

C

O

2

1

1

1

1

Number of atoms

1

1st sphere

2nd sphere

Input variables

data for pls regression
Data for PLS regression

Atom environment encoding

Chemical shifts

X

Y

Samples

find best structure encoding
Find best structure encoding
  • Initially best scheme of structure representation does not evident
  • We should find scheme which has best accuracy
  • We should optimize
    • substitutents coding scheme
    • number of used “spheres”
used data
Used data
  • 210 K of chemical shifts used as a training set.
  • 170 K of chemical shifts from recent literature used as external validation set.
how to describe atom type
How to describe atom type

“Central” atom

7 (N)

  • Atom type (C, O, etc.).
  • Hybridization (sp3, sp2, etc).
  • Valence
  • Number of neighbor H.
  • Charge
  • Distance to “central” atom (bonds)

1 (sp3)

3

2

0

3

“Substitutent”

is it the best possible accuracy
Is it the best possible accuracy?
  • Best possible average deviation is 3.5 ppm.
  • We need less than 3 ppm (2 is preferable).
  • Should we use additional variables?
  • We should be very careful adding variables.
substitutents interference cross effect

125,38

134,16

138,30

125,90

141,48

Substitutents interference (cross effect)

+11,26

+2,48

122,90

136.64

127.86

145.42

D-1.94

D+1.34

D-3.94

enhanced structure encoding
Enhanced structure encoding

Atom pair type

CH2 and CH

CandO

1

1

Number of pairs

Atoms

Pairs of atoms (Crosses)

Input variables

result for atom pairs crosses
Result foratom pairs (crosses)

Mean error, ppm

Distance

between atoms within a cross

Number of spheres

more enhancements
More enhancements?
  • Now accuracy is good enough (2.3 ppm)
  • But it is still bad in some cases
  • Unfortunately these cases are very important
  • This “special” cases should be taken into account
stereo effects double bonds
Stereo effects: double bonds
  • We use “topological” distance
  • Sometimes equal topological distance correspond to different “real” distances

25.7

3,9 A

17.6

2,9 A

modified structure encoding
Modified structure encoding

“Stereo” effects

Atoms

Pairs of atoms (Crosses)

Variables

size of training set
Size of training set
  • We have 1.5 millions of chemical shifts
  • We should try to use all available data
  • Only one problem – matrix size
  • In many cases matrix size becomes more than 2 GB
the final results
The final results

Faster by 3 order!

conclusions
Conclusions
  • Combination of “new” method with old well-known algorithm can produce very good (and unexpected) result
ad