slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Overview PowerPoint Presentation
Download Presentation
Overview

Loading in 2 Seconds...

play fullscreen
1 / 25

Overview - PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on

Delivering an online service for validating and standardizing chemical structure files using the ChemSpider platform. Overview. Introduction Why do we need to validate/standardise data Examples of problems in general Examples of Problems in ChemSpider Why InChI is not enough FDA rules.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Overview' - mya


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Delivering an online service for validating and standardizing chemical structure files using the ChemSpider platform

overview
Overview
  • Introduction
    • Why do we need to validate/standardise data
    • Examples of problems in general
    • Examples of Problems in ChemSpider
    • Why InChI is not enough
    • FDA rules
what are we trying to achieve
What are we trying to achieve?
  • Everyone wants high quality data
  • The ChemSpider team is building a reputation on data quality
  • Many datasources have errors
  • We need to identify:
    • Errors
    • Inconsistencies
    • Data duplication/Inappropriate separation of data
  • Requires a process of validation and standardization
what do we mean by validation and standardisation
What do we mean by Validation and Standardisation?
  • Validated
    • Check for hypervalency, charge balance, missing stereo
    • Name-Structure relationships, etc.
  • Standardized
    • Use standard rules to “standardize” compounds; Nitro groups, O-Metal bonds, tautomers, etc.
where will cvsp be useful
Where will CVSP be useful
  • Currently, a standalone system
  • In the future; Validation/standardisation routines will be used:
    • Built in to our deposition system
    • At registration for new compounds
    • To improve existing data in ChemSpider – pass through the ChemSpider backfile
  • Potential to offer optional checking service to authors
what do we do now
What do we do now?
  • Currently, ChemSpider uses structures (as InChI’s) as the database key
  • Need structures for depositions
  • 2 Steps:
    • Pre-processing prior to deposition
    • InChI algorithm; provides standardisation and mapping
what are the common errors
What are the common errors?
  • Records without a structure
  • Incorrect valences
  • Atom labels
what are the common errors1
What are the common errors?
  • Unbalanced charge
    • Name-structure errors
  • Salts
  • Polymers/Organometallics
  • Missing stereochemistry
side effects of inchi on chemspider
Side Effects of InChI on ChemSpider
  • Advantages and disadvantages
    • The depictions are meant to represent the same molecule
    • Not easy to pick out “bad” representations
substance registry system
Substance Registry System
  • How do you decide your standardisation rules?
  • Avoid standards in isolation

http://www.fda.gov/downloads/ForIndustry/DataStandards/SubstanceRegistrationSystem-UniqueIngredientIdentifierUNII/ucm127743.pdf

  • Note: This document is only a starting point
validation rules
Validation rules

In XML:

Code generated dynamically from rule set.

Indigo API used behind the scenes.

standardization rules
Standardization rules

Corrections stored in database:

SMIRKS-based corrections and also proximity-based metal–non-metal reconnection.

case study drugbank
Case study: DrugBank
  • DrugBank (http://www.drugbank.ca/) maintained by David Wishart
  • Database contains 6711 structures
  • Widely regarded as a well curated, high quality dataset

DrugBank 3.0: a comprehensive resource for 'omics' research on drugs. Knox C, Law V,

Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R,

Guo AC, Wishart DS., Nucleic Acids Res., 2011, 39, Jan,D1035-41.

chemspider standardization
ChemSpider Standardization
  • Entire ChemSpider database will be standardized using modified FDA rule set
  • Original Molfiles will be standardized and all properties (predicted properties, SMILES, InChIs, Names) will all be regenerated
  • Standardization procedures automatically applied to all future depositions
cvsp as a flexible system
CVSP as a Flexible System
  • There will be various rules sets
    • Rigid pre-defined rules: e.g. Meeting FDA specifications as written, Open PHACTS modified rules set, etc.
    • Flexible user-defined rules: users upload their rules in our custom format (XML)
    • The Open PHACTS rule set will be open to the community to reuse
incorporating cvsp into data processing platforms knime
Incorporating CVSP into data processing platforms: Knime
  • The workflow includes:
    • SDF reader
    • Indigo nodes
    • calls for ChemSpider validation Web services
incorporating cvsp into data processing platforms knime1
Incorporating CVSP into data processing platforms: Knime
  • Warning is returned as a result of processing
summary
Summary
  • Will release back results of DrugBank
  • Alpha version of CVSP available: http://cv.beta.rsc-us.org/Batches.aspx
  • Will be a resource for the Community
  • Will improve ChemSpider
  • Still a long way to go….
slide25

Thank you

Email: chemspider@rsc.org

Twitter: ChemSpider

http://www.chemspider.com

http://cssp.chemspider.com/