1 / 6

CICC Chemical Compound Mining Workflows

CICC Chemical Compound Mining Workflows. Jungkee (Jake) Kim Community Grids Laboratory. A Workflow for Big Red Demo I. “ Big Red ” is one of fastest supercomputers Mining chemical compounds found on research paper texts and showing them in 3D graphics. PubMed Abstracts. OSCAR3. SMILES

gizi
Download Presentation

CICC Chemical Compound Mining Workflows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CICC Chemical Compound Mining Workflows Jungkee (Jake) Kim Community Grids Laboratory

  2. A Workflow for Big Red Demo I • “Big Red” is one of fastest supercomputers • Mining chemical compounds found on research paper texts and showing them in 3D graphics PubMed Abstracts OSCAR3 SMILES Extraction Converting the format Text files XML files SMILES Molecular & Quantum Mechanics Converting to pictures Generating HTML script SDF files SDF files POV, JPG files CICC Project Meeting

  3. A Workflow for Big Red Demo II Final HTML pages

  4. A Workflow for Big Red Demo III • PubMed abstracts • 555,007 PubMed abstracts of 2005 – 2006 (part) R. Guha • 1,000 abstracts per node distributed (Simple parallelism) • 511 nodes X 1,000 input abstracts used for the demo • OSCAR3 • A Cambridge tool which extracts chemical information from text and produces an XML instance highlighting the chemical information • Used a revised version for convenient batch processing (some incompatibility to ‘BigRed’ architecture) • SMILES extraction • Extracting SMILES elements from OSCAR’s XML output files • Unique SMILES list within a batch CICC Project Meeting

  5. A Workflow for Big Red Demo IV • Generating 3D formats K. Gilbert • Converting from SMILES to SDF format • Molecular Mechanics program: “mengine” (MM engine) • No Quantum Mechanics (QM) in the demo • Converting 3D formats to pictures J. N. Huffman • Persistence of Vision Raytracer (POV-Ray): converting SDF to POV • Another program which converts the POV files to JPEG format • Generating HTML script • Showing those graphic files in an HTML page CICC Project Meeting

  6. Bigger Picture for the Workflow NIH PubMed Database OSCAR Text Analysis Cluster Grouping Toxicity Filtering Docking Initial 3D Structure Calculation High Throughput Screening (HTS) Data Organization and Flagging Molecular Mechanics Calculations NIH PubChem Database Quantum Mechanics Calculations Big Red Demo IU’s Varuna Database POV-Ray Parallel Rendering

More Related