1 / 42

Semantics and standards in chemistry Project Prospect, ChemSpider and chemical data

Semantics and standards in chemistry Project Prospect, ChemSpider and chemical data. Anomalocaris floreslivroselua.files.wordpress.com. What do our scientists want?. (apart from PDFs?). Chemists like structures. digitonin. Three years of semantic publishing.

janice
Download Presentation

Semantics and standards in chemistry Project Prospect, ChemSpider and chemical data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantics and standards in chemistryProject Prospect, ChemSpider and chemical data

  2. Anomalocaris floreslivroselua.files.wordpress.com

  3. What do our scientists want? (apart from PDFs?)

  4. Chemists like structures digitonin

  5. Three years of semantic publishing What were we trying to improve? • Discoverability • Use • Understanding • Linking And why... • What chemistry on the web may become... • Prolonged exposure to Peter Murray-Rust

  6. Quick, what can we mark up? What standards did we have in 2007? • InChI – for some compounds • ChEBI for some compounds and groups of compounds • Gene/Sequence/Cell Ontologies • IUPAC Gold Book (dictionary, really, but online) And RDF/OWL as distribution format 30-40% of our publishing

  7. RSS for humanreaders

  8. RSS for computers <item rdf:about=http://xlink.rsc.org/?DOI=b716356h&amp;RSS=1> <title> [… title] </title> <link>http://xlink.rsc.org/?DOI=b716356h&RSS=1</link> <description> [… blah] </description> <content:encoded> [… human-readable stuff</content:encoded> [… dublin core stuff …] <content:items> <rdf:Bag> <rdf:li> <content:item rdf:about=“info:inchi/InChI=1/C22H22NO4/c1-13-16-11-21(26-4)20(25-3)10-15(16)8-18-17-12-22(27-5)19(24-2)9-14(17)6-7-23(13)18/h6-12H,1-5H3/q+1"/> </rdf:li> <rdf:li> <content:item rdf:about=“http://purl.org/obo/owl/SO#SO:0000028”/> </rdf:li> </rdf:Bag> </content:items> </item>

  9. Enhanced HTML Database Textmining (Oscar) http://www.sciborg.org.uk/ http://oscar3-chem.sourceforge.net/ Manual QA Enhanced RSS

  10. How many numbered compounds actually are named in a given paper? iloprost (1) tributyl-1-hexynylstannane (2) the desired 2-heptyne (3) methyl–Pd(II) iodide 4 or 4′ alkynylstannane 5 the hypervalent stannate 6 (alkynyl)(methyl)Pd(II) complex 7 the desired methylalkyne 8 compounds 9–14 the stannyl precursors 15 and 16 methylated compounds 17 and 18 stannyl precursor 19 iloprost methyl ester 20 Why is this hard?

  11. Text mining is the easy bit • Cleaning up afterwards is hard Spent more time on cleaning than mining when quality is important

  12. Annotation: where and when? Pre-publication? (by authors) ? At publication? (by editors) Prospect After publication? (by the crowd) ChemMantis

  13. What if the authors did it all?

  14. Ontology Add-in for Word 2007 Services: Ontology download web service John Wilbanks • Phil Bourne • Lynn Fink Intent: Term recognition & disambiguation based on OBO or OWL formats Relationships: Ontology browser Source code and binary: http://research.microsoft.com/ontology/

  15. Authoring: Chem4Word – Chemistry Drawing in Word Author and edit 1D and 2D chemistry. Intent: Recognizes chemical dictionary and ontology terms <?xmlversion="1.0" ?> <cmlversion="3" convention="org-synth-report" xmlns="http://www.xml-cml.org/schema"> <moleculeid="m1"> <atomArray> <atomid="a1" elementType="C" x2="-2.9149999618530273" y2="0.7699999809265137" /> <atomid="a2" elementType="C" x2="-1.5813208400249916" y2="1.5399999809265137" /> <atomid="a3" elementType="O" x2="-0.24764171819695613" y2="0.7699999809265134" /> <atomid="a4" elementType="O" x2="-1.5813208400249912" y2="3.0799999809265137" /> <atomid="a5" elementType="H" x2="-4.248679083681063" y2="1.5399999809265137" /> <atomid="a6" elementType="H" x2="-2.914999961853028" y2="-0.7700000190734864" /> <atomid="a7" elementType="H" x2="-4.248679083681063" y2="-1.907348645691087E-8" /> <atomid="a8" elementType="H" x2="1.0860374036310796" y2="1.5399999809265132" /> </atomArray> <bondArray> <bondatomRefs2="a1 a2" order="1" /> <bondatomRefs2="a2 a3" order="1" /> <bondatomRefs2="a2 a4" order="2" /> <bondatomRefs2="a1 a5" order="1" /> <bondatomRefs2="a1 a6" order="1" /> <bondatomRefs2="a1 a7" order="1" /> <bondatomRefs2="a3 a8" order="1" /> </bondArray> </molecule> </cml> Relationships: Navigate and link referenced chemistry Intelligence: Verifies validity of authored chemistry Available soon: http://research.microsoft.com/chem4word/ Data: Semantics stored in Chemistry Markup Language

  16. What if the readers did it?

  17. ChemSpider ChemMantis

  18. Deposit structures…build dictionaries

  19. So now it’s 2010 – where are we? 4k articles marked up, 40k compounds RSC open ontology development • Methods, reactions, molecular processes User interface • Partially publishing platform dependent • Do we have the answers?

  20. Compelling ontology browsing?

  21. Remaining challenges Open problems • Chemical structures from images • Productive identifiers • Degree of manual effort required Putting ChemMantis and Prospect together • Backfile (to 1841) • Community curation

  22. Standards = longevity Help implement and develop standards • Open ontologies for chemistry • InChI Trust • How to publish this - pre-competition

  23. Addressing a real need in standards Pistoia Alliance “An initiative to provide an open foundation of data standards, ontologies and web-services to streamline the Pharmaceutical Drug Discovery workflow” Semantic Enrichment of the Scientific Literature (SESL) Oct09-Oct10 • Pistoia Alliance-funded • EBI • Elsevier, NPG, OUP, RSC

  24. What have been the benefits? For readers? For authors? For RSC?

  25. How to use this information better to benefit existing researchers – computers and humans • Real behaviour (for humans) • Clear requirements (for computer discovery)

  26. media.obsessable.com What do humans want? As few interfaces as possible

  27. What do computers want? Web services flickr.com/photos/microcosmos

  28. A free to access online database for chemists Website and web services Links over 20 million compounds integrated to <300 data sources A curation platform for the public to improve the quality of data online A deposition platform for the public to annotate and extend the data

  29. What’s the status of chemistry online? • Encyclopedic articles (Wikipedia) • Chemical vendor databases • Metabolic pathway databases • Virtual Screening databases • Property databases • Screening assay results • Patents with chemical structures (IBM & SureChem) • ADME/Tox data • Scientific publications • Compound aggregators • Blogs/Wikis and Open Notebook Science • Other publishers’ databases

  30. Caution! Question Everything!

  31. ChemSpider SyntheticPages

  32. Quality and cleanup • Who says what Taxol is? • What is the “timeline” for a molecule? • How do we clean up the Public data? • Not even experts can agree (and can take days, weeks to do the detective work). See taxol, digitonin

  33. Crowd-sourcing chemistry curation • identify/tag errors, edit names, synonyms, identify records to deprecate

  34. Future of chemistry online? Make the internet searchable by chemical structure and substructure by a free online service Aggregate and help improve disparate public sources Highlight our (and other publishers’) high quality publications Test sharing and discussion of research data in the open Provide structural home to preserve researchers’ collections, experimental and property data

  35. A society’s business... Develop standards Test, implement, refine, promote best practice Evolve (don’t forget the human behaviour)

More Related