virtual laboratory enabling distributed molecular modelling for drug discovery on the grid
Download
Skip this Video
Download Presentation
Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid

Loading in 2 Seconds...

play fullscreen
1 / 35

Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid - PowerPoint PPT Presentation


  • 140 Views
  • Uploaded on

WW Grid. Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid. Rajkumar Buyya. Gri d Computing and D istributed S ystems (GRIDS) Lab . The University of Melbourne Melbourne, Australia www.gridbus.org/vlab/. Agenda. Introduction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid' - pascal


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
virtual laboratory enabling distributed molecular modelling for drug discovery on the grid

WW Grid

Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid

Rajkumar Buyya

Grid Computing and Distributed Systems (GRIDS) Lab. The University of MelbourneMelbourne, Australiawww.gridbus.org/vlab/

agenda
Agenda
  • Introduction
    • Molecular Docking Application Needs
  • Virtual Lab Architecture
  • Grid Enabling CDB (chemical databases)
  • Application Composition
  • Scheduling Experiments
  • Conclusions
drug design data intensive computing on grid

Molecules

Protein

Drug Design: Data Intensive Computing on Grid
  • It involves screening millions of chemical compounds (molecules) in the Chemical DataBase (CDB) to identify those having potential to serve as drug candidates.

Chemical Databases

(legacy, in .MOL2 format)

[Collaboration with WEHI for Medical Science, Melbourne]

using basic job submission commands
Using Basic Job submission commands

Do all yourself! (manually)

Total Cost:$???

build distributed application scheduler
Build Distributed Application & Scheduler

Build App case by case basis

Complicated Construction

E.g., MPI based

Total Cost:$???

rapid parameterisation and deployment using the gridbus and nimrod g tools
Rapid Parameterisation and Deployment Using the Gridbus and Nimrod-G Tools

Compose, Submit, & Play!

docking application requirements

Chemical Databases

(legacy, in .MOL2 format)

Docking Application Requirements
  • It is compute intensive:
    • Each docking job can take few minutes to hours depending on the structural complexity.
  • It is data intensive:
    • The databases are huge (MBs tpo GBs) and each contain thousands of molecules. Screening all molecules in all databases is a real data challenge!
  • CDBs are distributed.
  • It is a killer application for the Grid.
datagrid brokering
DataGrid Brokering

“Screen 2K molecules in 30min. for $10”

Nimrod/G

Computational

Grid Broker

Algorithm1

Data Replica Catalogue

. . .

CDB Broker

AlgorithmN

3

“CDB replicas please?”

“advise CDB source?

5

1

4

2

“process & send results”

Grid Info.

Service

“selection & advise: use GSP4!”

“Screen mol.5 please?”

“Is GSP4 healthy?”

7

6

“mol.5 please?”

CDB Service

CDB Service

GSP1

GSP2

GSPm

GSP3(Grid Service Provider)

GSP4

GSPn

software tools
Software Tools
  • Molecular Modelling Application (DOCK)
  • Parameter Modelling Tools (Nimrod/enFusion)
  • Grid Resource Broker (Nimrod-G)
  • Data Grid Broker
  • Chemical DataBase (CDB) Management and Intelligent Access Tools
    • PDB databse Lookup/Index Table Generation.
    • PDB and associated index-table Replication.
    • PDB Replica Catalogue (that helps in Resource Discovery).
    • PDB Servers (that serve PDB clients requests).
    • PDB Brokering (Replica Selection).
    • PDB Clients for fetching Molecule Record (Data Movement).
  • Grid Middleware (Globus and GrACE)
  • Grid Fabric Management (Fork/LSF/Condor/Codine/…)
the virtual lab software stack

Nimrod and Virtual Lab Tools

[parametric programming language, GUI tools, and CDB indexer]

Molecular Modelling for Drug Design

CDB

PDB

The Virtual Lab. – Software Stack

APPLICATIONS

PROGRAMMING

TOOLS

USER LEVEL MIDDLEWARE

Nimrod-G and CDB Data Broker

[task farming engine, scheduler, dispatcher, agents, CDB (chemical database) server]

CORE MIDDLEWARE

Globus [security, information, job submission]

FABRIC

Worldwide Grid

[Distributed computers and databases with different Arch, OS, and local resource management systems]

v lab components interaction

Grid Info

Server

Nimrod-G Grid Broker

Task

Farming

Engine

Grid

Scheduler

Grid Trade

Server

Grid Tools

And Applications

User

Process

Do this in 30 min. for $10?

Nimrod

Agent

Local

Resource

Manager

ProcessServer

Grid

Dispatcher

Docking

Process

Get molecule “n” record from “abc” CDB

CDB

Server

File

Server

File access

Molecule “n”

Location ?

CDB Client

Get mol. record

.

.

.

.

.

.

.

.

.

.

.

.

Index and CDB1

CDBm

CDB Service on Grid

V-Lab Components Interaction

User Node

Grid Node

Compute Node

dock code enhanced by wehi u of melbourne
DOCK code*(Enhanced by WEHI, U of Melbourne)
  • A program to evaluate the chemical and geometric complementarities between a small molecule and a macromolecular binding site.
  • It explores ways in which two molecules, such as a drug and an enzyme or protein receptor, might fit together.
  • Compounds which dock to each other well, like pieces of a three-dimensional jigsaw puzzle, have the potential to bind.
  • So, why is it important to able to identify small molecules which may bind to a target macromolecule?
  • A compound which binds to a biological macromolecule may inhibit its function, and thus act as a drug.
  • E.g., disabling the ability of (HIV) virus attaching itself to molecule/protein!
  • With system specific code changed, we have been able to compile it for Sun-Solaris, PC Linux, SGI IRIX, Compaq Alpha/OSF1

* Original Code: University of California, San Francisco: http://www.cmpharm.ucsf.edu/kuntz/

dock input file

Molecule to be screened

Dock input file

score_ligand yes

minimize_ligand yes

multiple_ligands no

random_seed 7

anchor_search no

torsion_drive yes

clash_overlap 0.5

conformation_cutoff_factor 3

torsion_minimize yes

match_receptor_sites no

random_search yes

. . . . . .

. . . . . .

maximum_cycles 1

ligand_atom_file S_1.mol2

receptor_site_file ece.sph

score_grid_prefix ece

vdw_definition_file parameter/vdw.defn

chemical_definition_file parameter/chem.defn

chemical_score_file parameter/chem_score.tbl

flex_definition_file parameter/flex.defn

flex_drive_file parameter/flex_drive.tbl

ligand_contact_file dock_cnt.mol2

ligand_chemical_file dock_chm.mol2

ligand_energy_file dock_nrg.mol2

1 parameterize dock input file use nimrod tools gui language

Molecule to be screened

1. Parameterize Dock input file(use Nimrod Tools: GUI/language)

score_ligand $score_ligand

minimize_ligand $minimize_ligand

multiple_ligands $multiple_ligands

random_seed $random_seed

anchor_search $anchor_search

torsion_drive $torsion_drive

clash_overlap $clash_overlap

conformation_cutoff_factor $conformation_cutoff_factor

torsion_minimize $torsion_minimize

match_receptor_sites $match_receptor_sites

random_search $random_search

. . . . . .

. . . . . .

maximum_cycles $maximum_cycles

ligand_atom_file ${ligand_number}.mol2

receptor_site_file $HOME/dock_inputs/${receptor_site_file}

score_grid_prefix $HOME/dock_inputs/${score_grid_prefix}

vdw_definition_file vdw.defn

chemical_definition_file chem.defn

chemical_score_file chem_score.tbl

flex_definition_file flex.defn

flex_drive_file flex_drive.tbl

ligand_contact_file dock_cnt.mol2

ligand_chemical_file dock_chm.mol2

ligand_energy_file dock_nrg.mol2

2 create docking plan define variable and their value
2. Create Docking Plan:Define Variable and their value

parameter database_name label "database_name" text select oneof "aldrich" "maybridge" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_bioscreen_s" "inter_bioscreen_n" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300";

parameter CDB_SERVER text default "bezek.dstc.monash.edu.au";

parameter CDB_PORT_NO text default "5001";

parameter score_ligand text default "yes";

parameter minimize_ligand text default "yes";

parameter multiple_ligands text default "no";

parameter random_seed integer default 7;

parameter anchor_search text default "no";

parameter torsion_drive text default "yes";

parameter clash_overlap float default 0.5;

parameter conformation_cutoff_factor integer default 5;

parameter torsion_minimize text default "yes";

parameter match_receptor_sites text default "no";

. . . . . .

. . . . . .

parameter maximum_cycles integer default 1;

parameter receptor_site_file text default "ece.sph";

parameter score_grid_prefix text default "ece";

parameter ligand_number integer range from 1 to 2000 step 1;

Molecules to be screened

create docking planfile 3 define task that jobs need to do
Create Docking PlanFile3. Define Task that jobs need to do

task nodestart

copy ./parameter/vdw.defn node:.

copy ./parameter/chem.defn node:.

copy ./parameter/chem_score.tbl node:.

copy ./parameter/flex.defn node:.

copy ./parameter/flex_drive.tbl node:.

copy ./dock_inputs/get_molecule node:.

copy ./dock_inputs/dock_base node:.

endtask

task main

node:substitute dock_base dock_run

node:substitute get_molecule get_molecule_fetch

node:execute sh ./get_molecule_fetch

node:execute $HOME/bin/dock.$OS -i dock_run -o dock_out

copy node:dock_out ./results/dock_out.$jobname

copy node:dock_cnt.mol2 ./results/dock_cnt.mol2.$jobname

copy node:dock_chm.mol2 ./results/dock_chm.mol2.$jobname

copy node:dock_nrg.mol2 ./results/dock_nrg.mol2.$jobname

endtask

chemical database cdb
Chemical DataBase (CDB)
  • Databases consist of small molecules from commercially available organic synthesis libraries, and natural product databases.
  • There is also the ability to screen virtual combinatorial databases, in their entirety.
  • This methodology allows only the required compounds to be subjected to physical screening and/or synthesis reducing both time and expense.
target testcase
Target Testcase
  • The target for the test case: electrocardiogram (ECE) endothelin converting enzyme. This is involved in “heart stroke” and other transient ischemia.
  • Is·che·mi·a : A decrease in the blood supply to a bodily organ, tissue, or part caused by constriction or obstruction of the blood vessels.
scheduling molecular docking application on grid experiment
Scheduling Molecular Docking Application on Grid: Experiment
  • Workload – Docking 200 molecules with ECE
    • 200 jobs, each need in the order of 3 minute depending on molecule weight.
  • Deadline: 60 min. and budget: 50, 000 G$/tokens
  • Strategy: minimise time / cost
  • Execution Cost with cost optimisation
    • Optimise Cost: 14, 277(G$) (finished in 59.30 min.)
    • Optimise Time: 17, 702(G$) (finished in 34 min.)
    • In this experiment: Time-optimised scheduling costs extra 3.5K$ compared to that of Cost-optimised.
    • Users can now trade-off between Time Vs. Cost.
wwg setup

WW Grid

WWG Setup

Australia

North America

GMonitor

Melbourne+Monash U:

VPAC, Physics

ANL: SGI/Sun/SP2

NCSA: Cluster

Wisc: PC/cluster

NRC, Canada

Many others

Gridbus+Nimrod-G

MEG Visualisation

Solaris WS

Internet

Europe

Grid MarketDirectory

ZIB: T3E/Onyx

AEI: Onyx

CNR: Cluster

CUNI/CZ: Onyx

Pozman: SGI/SP2

Vrije U: Cluster

Cardiff: Sun E6500

Portsmouth: Linux PC

Manchester: O3K

Cambridge: SGI

Many others

Asia

AIST, Japan: Solaris Cluster

Osaka University: Cluster

Doshia: Linux cluster

Korea: Linux cluster

summary and conclusion
Summary and Conclusion
  • Applications can be Grid enabled and deployed on the Grid with minimal effort, but need a right set of Grid tools.
  • Distributed Docking demonstrates that Nimrod-G and Gridbus tools:
    • Enable Grid application software engineering rapidly
    • Provide powerful runtime machinery for optimal deployment of applications on the Grid.
  • Easy to use tools for composing applications to run on Grid are essential to attracting and getting application community on board.
  • Integrate with our Data Grid Broker to support selection of CDB nodes dynamically. (progress)
thanks
Thanks

http:/www.gridbus.org/vlab

parametric processing
Parametric Processing

Parameters

Magic Engine for

Manufacturing Humans!

Multiple Runs

Same Program

Multiple Data

Killer Application for the Grid!

Courtesy: Anand Natrajan, University of Virginia

ad