Overview of Chemical Informatics and Cyberinfrastructure Collaboratory. March 15 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http://www.infomall.org http://www.chembiogrid.org.
March 15 2007
Computer Science, Informatics, Physics
Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
Indiana University is focusing on two major areas:
From Biology, Chemistry, Computer Science, Informatics
at IU Bloomington and IUPUI (Indianapolis)
Funded by the National Institutes of Health
CICC Combines Grid Computing with Chemical Informatics
Large Scale Computing Challenges
Science and Cyberinfrastructure
CICC is an NIH funded project to support chemical informatics needs of High Throughput Cancer Screening Centers. The NIH is creating a data deluge of publicly available data on potential new drugs.
Chemical Informatics is non-traditional area of high performance computing, but many new, challenging problems may be investigated.
OSCAR-mined molecular signatures can be clustered, filtered for toxicity, and docked onto larger proteins. These are classic “pleasingly parallel” tasks. Top-ranking docked molecules can be further examined for drug potential.
Chemical informatics text analysis programs can process 100,000’s of abstracts of online journal
articles to extract chemical signatures of potential drugs.
Big Red (and the TeraGrid) will also enable us to perform time consuming, multi-stepped Quantum Chemistry calculations on all of PubMed. Results go back to public databases that are freely accessible by the scientific community.
Indiana University Department of Chemistry, School of Informatics, and Pervasive Technology Laboratories
OSCAR Document Analysis Collaboratory
Computational Chemistry (Gamess, Jaguar etc.)
Job Submission and Management
IU Big Red
TeraGrid, Open Science Grid
Collaboration as in SakaiCICC Web Service Infrastructure
Web Service Locations Collaboratory
Web Service Locations
University of Cologne
Where Does The Functionality Come From? Collaboratory
University of Michigan
European Chemicals Bureau
Giving a class remotely to UM students with video and web conferencing
Percent Inhibition or IC50 data is retrieved from HTS
Grids can link data analysis ( e.g image processing developed in existing Grids), traditional Chem-informatics tools, as well as annotation tools (Semantic Web, del.icio.us) and enhance lead ID and SAR analysis
A Grid of Grids linking collections of services atPubChem
Workflows encoding plate & control well statistics, distribution analysis, etc
Question: Was this screen successful?
Workflows encoding distribution analysis of screening results
Question: What should the active/inactive cutoffs be?
Question: What can we learn about the target protein or cell line from this screen?
Workflows encoding statistical comparison of results to similar screens, docking of compounds into proteins to correlate binding, with activity, literature search of active compounds, etc
Compounds submitted to PubChem
A protein implicated in tumor growth with known ligand is selected (in this case HSP90 taken from the PDB 1Y4 complex)
The screening data from a cellular HTS assay is similarity searched for compounds with similar 2D structures to the ligand.
Docking results and activity patterns fed into R services for building of activity models and correlations
Similar structures are filtered for drugability, are converted to 3D, and are automatically passed to the OpenEye FRED docking program for docking into the target protein.
Once docking is complete, the user visualizes the high-scoring docked structures in a portlet using the JMOL applet.
Similar structures to the ligand can be browsed using client portlets.
Simulation ServiceFORTRAN Code,
DB ServiceQueries, Clustering,Curation, etc.
PubChem, PDB,NCI, etc.