InstantJChem: a flexible chemical database system G. Marcou, D. Horvath +Laboratoire d’infochimie, Université de Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg
Introduction • The goal is to present InstantJChem for the storage and manipulation of chemical information • General presentation • Database search • Creation of a database from scratch
What is a database? • A database stores data in an ordered form on a precise subject. • A relational database stores information into tables which possess inter-references • A relational database management system (RDBMS) is a software that manages relational databases • InstantJChem is not a database and is not an RDBMS.
What is InstantJChem? • InstantJChem is a friendly interface between a RDBMS, chemical information and the user. User RDBMS Chemical Information
Key concepts of InstantJChem Projects Schema Databases and Tables Entities Data Trees Views
Exercise 1 Create a new project names IJCExercises…
Key concept: Project Project contains resources and connections to one or more databases. icon
Exercise 1 …and import the file SC100.SDF in it….
Key concept: Schema Schema/ Database Contains connection to a database and special tables (JChemProperties) icon
Key concept: Database and Tables Table Database and tables are managed by the RDBMS. Actually store information. icon
Key concept: Entities Entity An entity is a representation of data. icon It is a unique interface to conceptually different types of tables (Standard, Chemical, SQL, Extractions, etc).
Key concept: Data Trees Data Tree A collection of entities and views. icon Organize information using a hierarchy (parent-child relationship between entities).
Exercise 1 ….Customize a browser for it.
Key concept: Views Views An interface to data. icon For simple data, a spreadsheet view is relevant. For complex relational data, a form is mandatory.
Exercise 2 In the SC100 database, search for fluorobenzene and pyridine containing molecules. Use Substructure or Similarity search.
Exercise 2 In the SC100 database, search for fluorobenzene and pyridine containing molecules. Use Substructure or Similarity search. Substructure search: 20 hits Similarity search: 0 hits Substructure search: 14 hits Similarity search: 0 hits Similarity search uses Chemical Hashed Fingerprints defined at database creation.
Chemical Hashed Fingerprints (CHF) • Pattern Length: number of bonds of a pattern • Fingerprint Length: total number of bits to store the fingerprint • Bits per pattern: number of bits a pattern shall set on www.chemaxon.com Efficient annotation to accelerate structure search
Exercise 3 Combine molecule 25 and 89 into a pseudo-molecule to perform a superstructure query.
Exercise 4 Use compound 46 as a Full and Full fragment query to search the database. Repeat after removing the bromide from the query.
Structure Searches www.chemaxon.com
Exercise 5 Search benzene containing compounds, which name contains “pyrimidin” and annotated as “Good” concerning their aqueous solubility.
Exercise 6 Search for compounds with at least one aromatic ring containing at least on Nitrogen atom
Exercise 7 Search for compounds which MolWeight > 200 and not containing a benzene ring
Exercise 8 Search for compounds with MolWeigh > 200, then for compounds without a benzene ring and search for the union of the hit lists.
Execrise 9 Search for compounds possessing more than 4 microspecies at pH=4.0….
Exercise 9 … Export your hit list.
Exercise 10 Import in your project the file ISICCRsm.RDF…
Exercise 10 … Create a Browser for this database
Exercise 11 Search for reactions including an imidazole ring into their reactants then into their products.
Exercise 12 Add to your Schema a new data tree and structure entity named AlkanBoilingPoint…
Exercise 12 … and add a floating point value field named BoilingPoint.
Exercise 13 Add to the AlkanBoilingPoint entity the following data.
Exercise 14 Add to the AlkanBoilingPoint entity a new date field named Date and fill it.
Exercise 15 Add to the AlkanBoilingPoint entity a calculated value of LogP using a Chemicalterm field.
Summary • Create a project and schema • Import data • Search by substructure, superstructure, similarity, and exact match • Search by keyword • Combining queries and result lists • Export query results • Create a new database
Conclusion • InstantJChemis a Chemoinformatics layer above a standard SGDB. • Provides many more Chemoinformatics services (databases overlap, QSPR modeling, plots, enumeration, scripting) SGDB InstantJChem