Loading in 2 Seconds...

Prediction of NMR Chemical Shifts. A Chemometrical Approach

Loading in 2 Seconds...

- By
**maddy** - Follow User

- 108 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Prediction of NMR Chemical Shifts. A Chemometrical Approach' - maddy

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Prediction of NMR Chemical Shifts. A Chemometrical Approach

К.А. Blinov, Y.D. Smurnyy, Т.S. Churanova, М.Е. Elyashberg

Advanced Chemistry Development (ACD)

Sometimes solution is not obvious

- In many cases we obtain several structures corresponding to spectral data.
- In this case we need a method to rank the structures.
- Most powerful method - compare experimental and predicted 13C NMR spectra

How to find the best structure?

- In most cases predicted spectrum of “correct structure” has best fit to experimental spectrum
- In practice “correct structure” has average deviation between predicted and experimental spectra 2-3 ppm

The role of the spectra prediction

- Real-world task. Unknown structure with MF C29H32N2O5 and spectral data (1D and 2D NMR).
- 20 min to generate all structures (> 12 000)
- 24 hoursto predict the NMR 13С spectraof all the obtained structures
- Speed of spectra prediction should be increased

Methods of the prediction ofNMR spectra

– extremely slow

- Quantum Mechanics
- Database approach
- HOSE Codes
- Maximum Common Substructure
- Rule-based
- Additive scheme
- Neural Networks

– accurate but slow

– fast but inaccurate

- Our choice – improve accuracy of fast method

Additive scheme

0.52

-1.85

-2.79

d = åaixi

-1.35

153.71

144.31

-1.39

0.52

-4.49

1.43

d =

153.71

-1.85-4.49-1.39

-2.79+1.43+0.52+0.52

-1.35

=144.31

Main problem – find correct values of atom increments

Available data

- We have database of 1.5 millions of chemical shifts for 13С.
- We can try to obtain correct values!

How to encode atom environment

…

Atom’s type

CH3

CH2

CH2

CH

C

O

…

2

1

1

1

1

Number of atoms

1

1st sphere

2nd sphere

Input variables

Find best structure encoding

- Initially best scheme of structure representation does not evident
- We should find scheme which has best accuracy
- We should optimize
- substitutents coding scheme
- number of used “spheres”

Used data

- 210 K of chemical shifts used as a training set.
- 170 K of chemical shifts from recent literature used as external validation set.

How to describe atom type

“Central” atom

7 (N)

- Atom type (C, O, etc.).
- Hybridization (sp3, sp2, etc).
- Valence
- Number of neighbor H.
- Charge
- Distance to “central” atom (bonds)

1 (sp3)

3

2

0

3

“Substitutent”

Is it the best possible accuracy?

- Best possible average deviation is 3.5 ppm.
- We need less than 3 ppm (2 is preferable).
- Should we use additional variables?
- We should be very careful adding variables.

134,16

138,30

125,90

141,48

Substitutents interference (cross effect)+11,26

+2,48

122,90

136.64

127.86

145.42

D-1.94

D+1.34

D-3.94

Enhanced structure encoding

…

Atom pair type

CH2 and CH

CandO

…

1

1

Number of pairs

Atoms

Pairs of atoms (Crosses)

Input variables

More enhancements?

- Now accuracy is good enough (2.3 ppm)
- But it is still bad in some cases
- Unfortunately these cases are very important
- This “special” cases should be taken into account

Stereo effects: double bonds

- We use “topological” distance
- Sometimes equal topological distance correspond to different “real” distances

25.7

3,9 A

17.6

2,9 A

Size of training set

- We have 1.5 millions of chemical shifts
- We should try to use all available data
- Only one problem – matrix size
- In many cases matrix size becomes more than 2 GB

The final results

Faster by 3 order!

Prediction time: the past and present

C29H32N2O5

Conclusions

- Combination of “new” method with old well-known algorithm can produce very good (and unexpected) result

Download Presentation

Connecting to Server..