The collection,
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

The collection, curation and modeling of Open Melting Point measurements PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on
  • Presentation posted in: General

The collection, curation and modeling of Open Melting Point measurements. 5 th Meeting on U.S. Government Chemical Databases and Open Chemistry. Jean-Claude Bradley. Andrew Lang. Antony Williams. Department of Chemistry Drexel University. ChemSpider Royal Society of Chemistry.

Download Presentation

The collection, curation and modeling of Open Melting Point measurements

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The collection curation and modeling of open melting point measurements

The collection, curation and modeling of Open Melting Point measurements

5th Meeting on U.S. Government Chemical Databases and Open Chemistry

Jean-Claude Bradley

Andrew Lang

Antony Williams

Department of Chemistry

Drexel University

ChemSpider

Royal Society of Chemistry

Department of Mathematics

Oral Roberts University

August 26, 2011


The collection curation and modeling of open melting point measurements

The Problem of Data Quality in Chemistry

  • Lack of provenance

  • Reliance on a system of “trusted sources”

In the case of melting points:

  • CRC Handbook

  • Merck Index

  • Chemical Vendor Catalogs (e.g. Sigma-Aldrich)

  • Peer-Reviewed Journals


The collection curation and modeling of open melting point measurements

Strategy for the curation of melting points

Rely on redundancy when possible

Provide the maximum level of provenance when necessary (Open Notebook Science)

Adhere to Open Data, Open Descriptors and Open Algorithms for measurements and modeling

Using technology, we can begin to replace the “trusted source” model with one based on transparency and provenance


The collection curation and modeling of open melting point measurements

The Chemical Information Validation Sheet

567 curated and referenced measurements from

Fall 2010 Chemical Information Retrieval course


The collection curation and modeling of open melting point measurements

Investigating the m.p. inconsistencies of EGCG


The collection curation and modeling of open melting point measurements

Most popular data sources


The collection curation and modeling of open melting point measurements

Alfa Aesar donates melting points to the public


The collection curation and modeling of open melting point measurements

Open Melting Point Explorer


The collection curation and modeling of open melting point measurements

Outliers

EPA/PhysProp (donated all data to public also)

MDPI

dataset


The collection curation and modeling of open melting point measurements

Outliers for ethanol: Alfa Aesar and Oxford MSDS


The collection curation and modeling of open melting point measurements

Inconsistencies and SMILES problems within MDPI dataset


The collection curation and modeling of open melting point measurements

MDPI Dataset labeled with High Trust Level


The collection curation and modeling of open melting point measurements

EPA/PHYSPROP Structure Errors (Incorrect Valence): 2315 out of 43543 were contained pentavalentnitrogens


The collection curation and modeling of open melting point measurements

EPA/PHYSPROP Errors: Structure displayed is for the neutral compound dopamine but the associated CAS Number and chemical name in the file are for the hydrobromidesalt.


Common errors in datasets

Common errors in datasets

multiple melting points for the same compound in the same database

stereochemistry issues

sign inversion

conversion errors (Kelvin/CelciusFahrenheit/Celcius)

bad SMILES (non-rendering)

salts associated with SMILES for free base

using boiling point for melting point


Open melting point datasets

Open melting point datasets

Double+ validated: 2706 compounds (7413 highly curated measurements. range: 0.01-5 C. Compounds that had at least one chiral center, possessed cis/trans isomerism, were inorganic or a salt removed.)

Entire dataset: 19933 unique compounds (27684 measurements – no inorganics or salts)


Open models with open data using open descriptors cdk

Open Models with Open Data Using Open Descriptors (CDK)


Modeling results

Modeling Results


The collection curation and modeling of open melting point measurements

Melting point prediction service


The collection curation and modeling of open melting point measurements

Melting point predictions and measurements on iPhone/iPad(Alex Clark)


The collection curation and modeling of open melting point measurements

Publication of double+ validated melting point dataset to Nature Precedings and LuLu


The collection curation and modeling of open melting point measurements

For all Formats of ONS Projects


The collection curation and modeling of open melting point measurements

Open Melting Point Datasets

Currently 20,000 compounds with Open MPs


The collection curation and modeling of open melting point measurements

Some melting points can’t be resolved

only with literature: 4-benzyltoluene


The collection curation and modeling of open melting point measurements

Motivation: Faster Science,Better Science


The collection curation and modeling of open melting point measurements

Open Lab Notebook page measuring the melting point of 4-benzyltoluene


The collection curation and modeling of open melting point measurements

Using melting point for temperature dependent solubility prediction


The collection curation and modeling of open melting point measurements

Crowdsourcing Solubility Data


The collection curation and modeling of open melting point measurements

Integration of Multiple Web Services to Recommend Solvents for Reactions


The collection curation and modeling of open melting point measurements

All ONS web services


The collection curation and modeling of open melting point measurements

Google Apps Scripts web services


The collection curation and modeling of open melting point measurements

Google Apps Scripts for conveniently exploring melting point data


The collection curation and modeling of open melting point measurements

Comparison of model with triple validated measurements

Straight chain carboxylic acids from 1 to 10 carbons

Straight chain alcohols from 1 to 10 carbons


The collection curation and modeling of open melting point measurements

Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)


The collection curation and modeling of open melting point measurements

Google Apps Scripts for planning reactions and creating schemes


The collection curation and modeling of open melting point measurements

Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)


The collection curation and modeling of open melting point measurements

Conclusions

  • For science to progress quickly there is great benefit in moving away from a “trusted source” model to one based on transparency and data provenance

  • Open Notebook Science offers an efficient way to make research transparent and discoverable


  • Login