Virtual laboratory enabling distributed molecular modelling for drug discovery on the grid
1 / 35

Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid - PowerPoint PPT Presentation

  • Uploaded on

WW Grid. Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid. Rajkumar Buyya. Gri d Computing and D istributed S ystems (GRIDS) Lab . The University of Melbourne Melbourne, Australia Agenda. Introduction

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid' - pascal

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Virtual laboratory enabling distributed molecular modelling for drug discovery on the grid

WW Grid

Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid

Rajkumar Buyya

Grid Computing and Distributed Systems (GRIDS) Lab. The University of MelbourneMelbourne,


  • Introduction

    • Molecular Docking Application Needs

  • Virtual Lab Architecture

  • Grid Enabling CDB (chemical databases)

  • Application Composition

  • Scheduling Experiments

  • Conclusions

Drug design data intensive computing on grid



Drug Design: Data Intensive Computing on Grid

  • It involves screening millions of chemical compounds (molecules) in the Chemical DataBase (CDB) to identify those having potential to serve as drug candidates.

Chemical Databases

(legacy, in .MOL2 format)

[Collaboration with WEHI for Medical Science, Melbourne]

Using basic job submission commands
Using Basic Job submission commands

Do all yourself! (manually)

Total Cost:$???

Build distributed application scheduler
Build Distributed Application & Scheduler

Build App case by case basis

Complicated Construction

E.g., MPI based

Total Cost:$???

Rapid parameterisation and deployment using the gridbus and nimrod g tools
Rapid Parameterisation and Deployment Using the Gridbus and Nimrod-G Tools

Compose, Submit, & Play!

Docking application requirements

Chemical Databases Nimrod-G Tools

(legacy, in .MOL2 format)

Docking Application Requirements

  • It is compute intensive:

    • Each docking job can take few minutes to hours depending on the structural complexity.

  • It is data intensive:

    • The databases are huge (MBs tpo GBs) and each contain thousands of molecules. Screening all molecules in all databases is a real data challenge!

  • CDBs are distributed.

  • It is a killer application for the Grid.

Datagrid brokering
DataGrid Brokering Nimrod-G Tools

“Screen 2K molecules in 30min. for $10”



Grid Broker


Data Replica Catalogue

. . .

CDB Broker



“CDB replicas please?”

“advise CDB source?





“process & send results”

Grid Info.


“selection & advise: use GSP4!”

“Screen mol.5 please?”

“Is GSP4 healthy?”



“mol.5 please?”

CDB Service

CDB Service




GSP3(Grid Service Provider)



Software tools
Software Tools Nimrod-G Tools

  • Molecular Modelling Application (DOCK)

  • Parameter Modelling Tools (Nimrod/enFusion)

  • Grid Resource Broker (Nimrod-G)

  • Data Grid Broker

  • Chemical DataBase (CDB) Management and Intelligent Access Tools

    • PDB databse Lookup/Index Table Generation.

    • PDB and associated index-table Replication.

    • PDB Replica Catalogue (that helps in Resource Discovery).

    • PDB Servers (that serve PDB clients requests).

    • PDB Brokering (Replica Selection).

    • PDB Clients for fetching Molecule Record (Data Movement).

  • Grid Middleware (Globus and GrACE)

  • Grid Fabric Management (Fork/LSF/Condor/Codine/…)

The virtual lab software stack

Nimrod and Virtual Lab Tools Nimrod-G Tools

[parametric programming language, GUI tools, and CDB indexer]

Molecular Modelling for Drug Design



The Virtual Lab. – Software Stack





Nimrod-G and CDB Data Broker

[task farming engine, scheduler, dispatcher, agents, CDB (chemical database) server]


Globus [security, information, job submission]


Worldwide Grid

[Distributed computers and databases with different Arch, OS, and local resource management systems]

V lab components interaction

Grid Info Nimrod-G Tools


Nimrod-G Grid Broker






Grid Trade


Grid Tools

And Applications



Do this in 30 min. for $10?











Get molecule “n” record from “abc” CDB





File access

Molecule “n”

Location ?

CDB Client

Get mol. record













Index and CDB1


CDB Service on Grid

V-Lab Components Interaction

User Node

Grid Node

Compute Node

Dock code enhanced by wehi u of melbourne
DOCK code* Nimrod-G Tools (Enhanced by WEHI, U of Melbourne)

  • A program to evaluate the chemical and geometric complementarities between a small molecule and a macromolecular binding site.

  • It explores ways in which two molecules, such as a drug and an enzyme or protein receptor, might fit together.

  • Compounds which dock to each other well, like pieces of a three-dimensional jigsaw puzzle, have the potential to bind.

  • So, why is it important to able to identify small molecules which may bind to a target macromolecule?

  • A compound which binds to a biological macromolecule may inhibit its function, and thus act as a drug.

  • E.g., disabling the ability of (HIV) virus attaching itself to molecule/protein!

  • With system specific code changed, we have been able to compile it for Sun-Solaris, PC Linux, SGI IRIX, Compaq Alpha/OSF1

* Original Code: University of California, San Francisco:

Dock input file

Molecule to be screened Nimrod-G Tools

Dock input file

score_ligand yes

minimize_ligand yes

multiple_ligands no

random_seed 7

anchor_search no

torsion_drive yes

clash_overlap 0.5

conformation_cutoff_factor 3

torsion_minimize yes

match_receptor_sites no

random_search yes

. . . . . .

. . . . . .

maximum_cycles 1

ligand_atom_file S_1.mol2

receptor_site_file ece.sph

score_grid_prefix ece

vdw_definition_file parameter/vdw.defn

chemical_definition_file parameter/chem.defn

chemical_score_file parameter/chem_score.tbl

flex_definition_file parameter/flex.defn

flex_drive_file parameter/flex_drive.tbl

ligand_contact_file dock_cnt.mol2

ligand_chemical_file dock_chm.mol2

ligand_energy_file dock_nrg.mol2

1 parameterize dock input file use nimrod tools gui language

Molecule to be screened Nimrod-G Tools

1. Parameterize Dock input file(use Nimrod Tools: GUI/language)

score_ligand $score_ligand

minimize_ligand $minimize_ligand

multiple_ligands $multiple_ligands

random_seed $random_seed

anchor_search $anchor_search

torsion_drive $torsion_drive

clash_overlap $clash_overlap

conformation_cutoff_factor $conformation_cutoff_factor

torsion_minimize $torsion_minimize

match_receptor_sites $match_receptor_sites

random_search $random_search

. . . . . .

. . . . . .

maximum_cycles $maximum_cycles

ligand_atom_file ${ligand_number}.mol2

receptor_site_file $HOME/dock_inputs/${receptor_site_file}

score_grid_prefix $HOME/dock_inputs/${score_grid_prefix}

vdw_definition_file vdw.defn

chemical_definition_file chem.defn

chemical_score_file chem_score.tbl

flex_definition_file flex.defn

flex_drive_file flex_drive.tbl

ligand_contact_file dock_cnt.mol2

ligand_chemical_file dock_chm.mol2

ligand_energy_file dock_nrg.mol2

2 create docking plan define variable and their value
2. Create Docking Plan: Nimrod-G Tools Define Variable and their value

parameter database_name label "database_name" text select oneof "aldrich" "maybridge" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_bioscreen_s" "inter_bioscreen_n" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300";

parameter CDB_SERVER text default "";

parameter CDB_PORT_NO text default "5001";

parameter score_ligand text default "yes";

parameter minimize_ligand text default "yes";

parameter multiple_ligands text default "no";

parameter random_seed integer default 7;

parameter anchor_search text default "no";

parameter torsion_drive text default "yes";

parameter clash_overlap float default 0.5;

parameter conformation_cutoff_factor integer default 5;

parameter torsion_minimize text default "yes";

parameter match_receptor_sites text default "no";

. . . . . .

. . . . . .

parameter maximum_cycles integer default 1;

parameter receptor_site_file text default "ece.sph";

parameter score_grid_prefix text default "ece";

parameter ligand_number integer range from 1 to 2000 step 1;

Molecules to be screened

Create docking planfile 3 define task that jobs need to do
Create Docking PlanFile Nimrod-G Tools 3. Define Task that jobs need to do

task nodestart

copy ./parameter/vdw.defn node:.

copy ./parameter/chem.defn node:.

copy ./parameter/chem_score.tbl node:.

copy ./parameter/flex.defn node:.

copy ./parameter/flex_drive.tbl node:.

copy ./dock_inputs/get_molecule node:.

copy ./dock_inputs/dock_base node:.


task main

node:substitute dock_base dock_run

node:substitute get_molecule get_molecule_fetch

node:execute sh ./get_molecule_fetch

node:execute $HOME/bin/dock.$OS -i dock_run -o dock_out

copy node:dock_out ./results/dock_out.$jobname

copy node:dock_cnt.mol2 ./results/dock_cnt.mol2.$jobname

copy node:dock_chm.mol2 ./results/dock_chm.mol2.$jobname

copy node:dock_nrg.mol2 ./results/dock_nrg.mol2.$jobname


Chemical database cdb
Chemical DataBase (CDB) (e.g., Docking)

  • Databases consist of small molecules from commercially available organic synthesis libraries, and natural product databases.

  • There is also the ability to screen virtual combinatorial databases, in their entirety.

  • This methodology allows only the required compounds to be subjected to physical screening and/or synthesis reducing both time and expense.

Target testcase
Target Testcase (e.g., Docking)

  • The target for the test case: electrocardiogram (ECE) endothelin converting enzyme. This is involved in “heart stroke” and other transient ischemia.

  • Is·che·mi·a : A decrease in the blood supply to a bodily organ, tissue, or part caused by constriction or obstruction of the blood vessels.

Scheduling molecular docking application on grid experiment
Scheduling Molecular Docking Application on Grid: Experiment (e.g., Docking)

  • Workload – Docking 200 molecules with ECE

    • 200 jobs, each need in the order of 3 minute depending on molecule weight.

  • Deadline: 60 min. and budget: 50, 000 G$/tokens

  • Strategy: minimise time / cost

  • Execution Cost with cost optimisation

    • Optimise Cost: 14, 277(G$) (finished in 59.30 min.)

    • Optimise Time: 17, 702(G$) (finished in 34 min.)

    • In this experiment: Time-optimised scheduling costs extra 3.5K$ compared to that of Cost-optimised.

    • Users can now trade-off between Time Vs. Cost.

Wwg setup

WW Grid (e.g., Docking)

WWG Setup


North America


Melbourne+Monash U:

VPAC, Physics


NCSA: Cluster

Wisc: PC/cluster

NRC, Canada

Many others


MEG Visualisation

Solaris WS



Grid MarketDirectory

ZIB: T3E/Onyx

AEI: Onyx

CNR: Cluster


Pozman: SGI/SP2

Vrije U: Cluster

Cardiff: Sun E6500

Portsmouth: Linux PC

Manchester: O3K

Cambridge: SGI

Many others


AIST, Japan: Solaris Cluster

Osaka University: Cluster

Doshia: Linux cluster

Korea: Linux cluster

Dbc scheduling for time optimization no of jobs in exec
DBC Scheduling for Time Optimization (e.g., Docking)– No. of Jobs in Exec.

Dbc scheduling for cost optimization no of jobs in exec
DBC Scheduling for Cost Optimization (e.g., Docking)– No. of Jobs in Exec.

Summary and conclusion
Summary and Conclusion (e.g., Docking)

  • Applications can be Grid enabled and deployed on the Grid with minimal effort, but need a right set of Grid tools.

  • Distributed Docking demonstrates that Nimrod-G and Gridbus tools:

    • Enable Grid application software engineering rapidly

    • Provide powerful runtime machinery for optimal deployment of applications on the Grid.

  • Easy to use tools for composing applications to run on Grid are essential to attracting and getting application community on board.

  • Integrate with our Data Grid Broker to support selection of CDB nodes dynamically. (progress)

Thanks (e.g., Docking)


Dbc time opt scheduling
DBC Time Opt. Scheduling (e.g., Docking)

Dbc scheduling for time optimization no of jobs finished
DBC Scheduling for Time Optimization (e.g., Docking)– No. of Jobs Finished

Dbc scheduling for time optimization budget spent
DBC Scheduling for Time Optimization (e.g., Docking)– Budget Spent

Dbc cost opt scheduling
DBC Cost Opt. Scheduling (e.g., Docking)

Dbc scheduling for cost optimization no of jobs finished
DBC Scheduling for Cost Optimization (e.g., Docking)– No. of Jobs Finished

Dbc scheduling for cost optimization budget spent
DBC Scheduling for Cost Optimization (e.g., Docking)– Budget Spent

Parametric processing
Parametric Processing (e.g., Docking)


Magic Engine for

Manufacturing Humans!

Multiple Runs

Same Program

Multiple Data

Killer Application for the Grid!

Courtesy: Anand Natrajan, University of Virginia