1 / 24

Klaus Gubernator , Craig James, e Molecules Inc. ACS 232nd National Meeting

Chemical Structure Search Engines in Cyberspace. Klaus Gubernator , Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006. Chemistry on the Internet. The web has revolutionized the way we retrieve information

carolynz
Download Presentation

Klaus Gubernator , Craig James, e Molecules Inc. ACS 232nd National Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chemical Structure Search Engines in Cyberspace Klaus Gubernator, Craig James, eMolecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco,September 14, 2006

  2. Chemistry on the Internet • The web has revolutionized the way we retrieve information • Chemistry is a late participant in this revolution

  3. Search Google Images for “Aspirin” • N

  4. Structure of “triazolo thiadiazine” http://scripts.iucr.org/cgi-bin/paper?cnor=a12172&buy=yes Acta Cryst. (1975). B31, 1427-1429     The crystal structure of 7-amino-2H,4H-vic-triazolo[4,5-c]-1,2,6-thiadiazine 1,1-dioxide (ATT) C. Foces-Foces, F. H. Cano and S. García-Blanco Buy online You may purchase this article in PDF and/or HTML formats. For purchasers in the UK, and for purchasers elsewhere in the European Community who do not have a VAT number, VAT will be added to the price of the article. Format*   PDF (US $40, plus US $7 for EC purchases)

  5. Cheminformatics.org Datasets (which are, in contrast to other dataset lists, available in a structural format)This list will be expanded continuously. Please don't hesitate to make published datasets publicly available here. Currently available: 44 DatasetsNote: The Briem/Lessel and Hert/Willett Dataset are only availableas MDDR ID's due to license reasons. Please contact MDL for further information on the database. The datasets have nonethless been included here because they are standard datasets for similarity searching. – Andreas Bender • Binary (active/inactive) datasets • QSAR datasets • QSPR datasets • Toxicity datasets • Metabolism datasets • Permeability datasets • Docking datasets • Mechanistic datasets • Mixed/Other datasets

  6. Stahl dataset CS(=O)(=O)Nc1ccc(cc1OC2CCCCC2)N(=O)=O CS(=O)(=O)Nc1cc2CCC(=O)c2cc1Oc3ccc(F)cc3F CS(=O)(=O)Nc1cc2CCC(=O)c2cc1Sc3ccc(F)cc3F CS(=O)(=O)Nc1ccc(cc1Sc2ccc(F)cc2F)C(=O)N CS(=O)(=O)Nc1ccc(cc1Sc2ccc(Cl)cc2Cl)S(=O)(=O)N COc1ccc(cc1)c2sc(nc2c3ccc(cc3)S(=O)(=O)C)c4ccccc4Cl COc1ccc(cc1)c2sc(nc2c3ccc(SC)cc3)c4ccccc4Cl CS(=O)(=O)c1ccc(cc1)n2nc(cc2c3ccc(F)cc3)C(F)(F)F CS(=O)(=O)c1ccc(cc1)n2nc(cc2c3ccc(Br)cc3)C(F)(F)F Cc1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F CS(=O)(=O)c1ccc(cc1)c2snnc2c3ccc(F)cc3 CC(=O)c1nc(c(o1)c2ccc(c(F)c2)S(=O)(=O)N)c3ccccc3 Cc1nc(C2CCCCC2)c(o1)c3ccc(c(F)c3)S(=O)(=O)N CS(=O)(=O)c1ccc(cc1)c2[nH]c(nc2C3CCCCC3)C(F)(F)F CS(=O)(=O)c1ccc(cc1)c2[nH]c(nc2c3ccc(F)cc3)C(F)(F)F CS(=O)(=O)c1ccc(cc1)C2=C(C(=O)OC32CC3)c4ccccc4 CS(=O)(=O)c1ccc(cc1)C2=C(C(=O)OC32CCCC3)c4ccccc4 CS(=O)(=O)c1ccc(cc1)c2cnn(Cc3ccccc3)c(=O)c2c4ccccc4 CS(=O)(=O)c1ccc(cc1)c2nn(Cc3ccccc3)c(c2c4ccc(F)cc4)C(F)(F)F NS(=O)(=O)c1ccc(cc1)c2c(CO)onc2c3ccccc3 CS(=O)(=O)c1ccc(cc1)c2cc(Cl)nn2c3ccc(F)cc3 NS(=O)(=O)c1ccc(cc1)c2cc(nn2c3ccc(F)cc3)C(F)(F)F NS(=O)(=O)c1ccc(cc1)n2nc(cc2c3nc4cccc(F)c4s3)C(F)F

  7. Yokoyama dataset Unnamed -MTS- 06200418093D 0 0.00000 0.00000 0 13 13 0 0 0 0 0 0 0 0 1 V2000 0.0180 -0.0030 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0 1.7880 0.0070 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.4880 0.0110 1.2120 C 0 0 0 0 0 0 0 0 0 0 0 0 3.8880 0.0200 1.2120 C 0 0 0 0 0 0 0 0 0 0 0 0 4.5880 0.0240 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.0030 0.0330 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 6.6610 1.1880 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.0400 2.2500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 8.1410 1.1970 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.7570 0.1440 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 8.7890 2.3360 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 3.8880 0.0200 -1.2130 C 0 0 0 0 0 0 0 0 0 0 0 0 2.4880 0.0120 -1.2120 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 3 2 0 0 0 0 3 4 1 0 0 0 0 4 5 2 0 0 0 0 … M END > <BIO> 48.00 $$$$

  8. Search Genbank for “aattccgg”

  9. C

  10. C

  11. Why is so little chemistry on the web? • Tradition? • Strong providers of subscription services? • Searching for chemical structures is significantly more difficult than text searching? • Chemical identifiers are not standardized?

  12. Open Access Chemical Search Engines PubChem - NIH ChemBank – Harvard ZINC – UCSF ChemDB – UC Irvine ChemExper - Lausanne ChemFinder – CambridgeSoft

  13. www.emolecules.com • New Chemistry Search Engine • A large database of publicly available molecular structures • Launched November 2005 • 50,000 searches per month, rapidly growing

  14. www.emolecules.com Free chemistry search site for publicly available chemical information

  15. Advanced Search Powerful features: • hit list management • union, intersect, subtract, difference • manual selection • export lists in many formats • persistent hitlists

  16. T O

  17. Content: 16M entries, 5.6M structures Academic and government databases • NIST WebBook • DrugBank • Protein Ligands Chemical suppliers • 150 electronic catalogs included Future goal • All publicly available chemical information

  18. Why is it so fast? • Novel chemical search engine technology • Method represents a major departure from previously known algorithms • Molecular keys (MDL) • Fingerprints (Daylight) • Feature Trees (BioSolv)

  19. Search engine technology • Analyze each molecule for distinguishing structural features • Generate all features algorithmically • Normalize features and use them for indexing • Result: very fast searches

  20. Who is eMolecules? Klaus Gubernator Craig A. James Rashmi Mistry

  21. Summary • Free for depositors and users • Very fast search engine • High quality user interface • Rich functionality • Complementary with other engines

  22. Contact Information Klaus Gubernator klaus@emolecules.com Skype: emolecules +1-858-764-1941

More Related