Automated Identification of Small Organic Molecular Structures Best Matching NMR Data

1D Proton 1D Carbon Edited Gradient HSQC Gradient HMBC H Projection C Projection Gradient HSQC Automated Identification of Small Organic Molecular Structures Best Matching NMR Data Reinhard Dunkel* and Xinzi Wu, ScienceSoft LLC, Sandy, UT NMRanalyst: Spectral Analysis & Structure Elucidation Full structure elucidations often rely on several spectroscopic techniques and are normally performed by an experienced spectroscopist. But automated methods start to simplify this process. Dr. Péter Sándor, Varian Deutschland GmbH, provided 1D proton (34 s acquisition time), 1D carbon (4 h 51 m), edited gradient HSQC (48 m 2 s), and gradient HMBC (1 h 37 m) FIDs of 3.6 mg/ml prednisone in DMSO-d6 acquired on a Varian INOVA 500 MHz spectrometer. Single-bond H-C assignments are determined from an HSQC spectrum. Its editing allows to distinguish CH2 from CH3 and CH groups. HMBC observes mainly two- and three-bond correlations. NMRanalyst analyzes the shown spectra. Then the molecular skeleton(s) consistent with this incomplete and ambiguous spectral information are generated. The above figure shows the elucidated prednisone structure. An atom label indicates the shift in ppm. A bond label provides the H ppm shift over which the connectivity was established. findit Script: Small Organic Molecule Identification The NMRanalyst FindIt module contains over 14.5 million structures from NIH PubChem (http://nihroadmap.nih.gov). It matches the predicted carbon and proton shifts for structures in this collection to the observed carbon and proton shifts and spectral integrals determined by NMRanalyst. It reports the best structures consistent with the NMR data. New structures can be added to the FindIt structure database. Even if the correct structure is not in the FindIt collection, best matches are likely similar to an unknown molecule. FindIt can be invoked through the findit script. The findit script automatically identifies supported 1D FID or spectra in the current directory, converts them to the NMRanalyst format, analyzes the NMR data, and determines and displays the best matching structures. Its execution takes only minutes on a modern PC. The following figure shows the best matching structures for the prednisone 1D proton and 1D carbon FIDs as the result of the findit script. Determining most likely FindIt structures... 1: 0.964780 ( 5304) 2: 0.942377 ( 273117) 3: 0.929914 (16348068) 4: 0.929087 (16373846) 5: 0.926625 (10645330) 6: 0.925683 (17556896) 7: 0.924923 (21657150) 8: 0.923669 (16301457) 9: 0.922883 (11489638) 10: 0.922523 (17408781) Conclusions A nearly infinite number of small organic molecules can be synthesized. While the actual structure may not be contained, the 14.5 million PubChem structures provide a good representation of possible structures. Commonly acquired 1D proton and 1D carbon survey spectra are sufficiently informative for this identification. So full structure elucidations are often not required. The FindIt structure identification is simple to automate as illustrated by the findit script. The reported results were obtained with NMRanalyst 3.5. The software is available for Red Hat Linux (Enterprise 4, Enterprise 5, and compatible) and Microsoft Windows (2000, XP, and Vista with installation in compatibility mode). Please see www.ScienceSoft.net for further information and availability. Determining most likely FindIt structures... 1: 0.983420 ( 4900) 2: 0.959478 ( 9845968) 3: 0.959444 (11503548) 4: 0.945762 (21528905) 5: 0.939216 (17962854) 6: 0.937879 (21864630) 7: 0.934467 (21539784) 8: 0.928929 (11787664) 9: 0.928502 (22815241) 10: 0.925805 ( 3034026) The above listing shows the resulting structure rating (1 for perfect and 0 for no agreement) and the PubChem Compound ID in parentheses. The findit script uses available information such as 1D proton or 1D carbon FIDs, molecular weight range, molecular formula, HSQC projections, or any combination thereof. This approach identified the correct structure in most of our over 200 test cases (R. Dunkel, X. Wu, Identification of organic molecules from a structure database using proton and carbon NMR analysis results, Journal of Magnetic Resonance 188 (2007) 97-110). For these prednisone spectra, 14 carbon skeletons result. Heteroatoms, like oxygen, can be inferred from free fragment valences based on observed carbon chemical shifts. findit Script: Identify Structures From HSQC Only Dr. Péter Sándor, Varian Deutschland GmbH, provided this 20 mg strychnine in CDCl3 datasets, 3 mm tube, Varian INOVA 500 MHz spectrometer. The advantage of the FindIt structure identification is that heteroatoms such as the oxygen and nitrogen atoms of strychnine do not have to be detected. Their influence on proton and carbon shifts around them supports identification of the right structure. Strychnine is identified as the most likely structure. Planned Development The next step is to combine the NMRanalyst structure elucidation and FindIt structure identification functionalities. For example, incorporating HMBC correlation information likely results in a more effective FindIt identification of best matching structures among millions of candidate structures. And the structure elucidation can be extended to use the FindIt results to combine identified molecular skeletons. http://www.ScienceSoft.net

Automated Identification of Small Organic Molecular Structures Best Matching NMR Data

Automated Identification of Small Organic Molecular Structures Best Matching NMR Data

Presentation Transcript

Solving NMR structures II

Chem-806 Identification of organic and inorganic compounds by advance NMR techniques

Chem-806 Identification of organic and inorganic compounds by advance NMR techniques

Chem-806 Identification of organic and inorganic compounds by advance NMR techniques

Chem-806 Identification of organic and inorganic compounds by advance NMR techniques

Organic Structures

Molecular identification of fish species

Representation of molecular structures

Representation of molecular structures

Molecular Structures

Molecular Structures

Solution structures by NMR

Molecular Structures

Automated Identification Systems

Solving NMR structures I

MOLECULAR STRUCTURES

Automated Molecular Replacement

Molecular Structures

3 - NMR Data and Molecular Structure

Identification Of Organic Products

ORGANIC NMR INTERPRETATION

Automated Identification Systems