Virtual laboratory enabling distributed molecular modelling for drug discovery on the grid
Download
1 / 35

Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid - PowerPoint PPT Presentation


  • 139 Views
  • Uploaded on

WW Grid. Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid. Rajkumar Buyya. Gri d Computing and D istributed S ystems (GRIDS) Lab . The University of Melbourne Melbourne, Australia www.gridbus.org/vlab/. Agenda. Introduction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid' - pascal


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Virtual laboratory enabling distributed molecular modelling for drug discovery on the grid

WW Grid

Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid

Rajkumar Buyya

Grid Computing and Distributed Systems (GRIDS) Lab. The University of MelbourneMelbourne, Australiawww.gridbus.org/vlab/


Agenda
Agenda

  • Introduction

    • Molecular Docking Application Needs

  • Virtual Lab Architecture

  • Grid Enabling CDB (chemical databases)

  • Application Composition

  • Scheduling Experiments

  • Conclusions


Drug design data intensive computing on grid

Molecules

Protein

Drug Design: Data Intensive Computing on Grid

  • It involves screening millions of chemical compounds (molecules) in the Chemical DataBase (CDB) to identify those having potential to serve as drug candidates.

Chemical Databases

(legacy, in .MOL2 format)

[Collaboration with WEHI for Medical Science, Melbourne]


Using basic job submission commands
Using Basic Job submission commands

Do all yourself! (manually)

Total Cost:$???


Build distributed application scheduler
Build Distributed Application & Scheduler

Build App case by case basis

Complicated Construction

E.g., MPI based

Total Cost:$???


Rapid parameterisation and deployment using the gridbus and nimrod g tools
Rapid Parameterisation and Deployment Using the Gridbus and Nimrod-G Tools

Compose, Submit, & Play!


Docking application requirements

Chemical Databases Nimrod-G Tools

(legacy, in .MOL2 format)

Docking Application Requirements

  • It is compute intensive:

    • Each docking job can take few minutes to hours depending on the structural complexity.

  • It is data intensive:

    • The databases are huge (MBs tpo GBs) and each contain thousands of molecules. Screening all molecules in all databases is a real data challenge!

  • CDBs are distributed.

  • It is a killer application for the Grid.


Datagrid brokering
DataGrid Brokering Nimrod-G Tools

“Screen 2K molecules in 30min. for $10”

Nimrod/G

Computational

Grid Broker

Algorithm1

Data Replica Catalogue

. . .

CDB Broker

AlgorithmN

3

“CDB replicas please?”

“advise CDB source?

5

1

4

2

“process & send results”

Grid Info.

Service

“selection & advise: use GSP4!”

“Screen mol.5 please?”

“Is GSP4 healthy?”

7

6

“mol.5 please?”

CDB Service

CDB Service

GSP1

GSP2

GSPm

GSP3(Grid Service Provider)

GSP4

GSPn


Software tools
Software Tools Nimrod-G Tools

  • Molecular Modelling Application (DOCK)

  • Parameter Modelling Tools (Nimrod/enFusion)

  • Grid Resource Broker (Nimrod-G)

  • Data Grid Broker

  • Chemical DataBase (CDB) Management and Intelligent Access Tools

    • PDB databse Lookup/Index Table Generation.

    • PDB and associated index-table Replication.

    • PDB Replica Catalogue (that helps in Resource Discovery).

    • PDB Servers (that serve PDB clients requests).

    • PDB Brokering (Replica Selection).

    • PDB Clients for fetching Molecule Record (Data Movement).

  • Grid Middleware (Globus and GrACE)

  • Grid Fabric Management (Fork/LSF/Condor/Codine/…)


The virtual lab software stack

Nimrod and Virtual Lab Tools Nimrod-G Tools

[parametric programming language, GUI tools, and CDB indexer]

Molecular Modelling for Drug Design

CDB

PDB

The Virtual Lab. – Software Stack

APPLICATIONS

PROGRAMMING

TOOLS

USER LEVEL MIDDLEWARE

Nimrod-G and CDB Data Broker

[task farming engine, scheduler, dispatcher, agents, CDB (chemical database) server]

CORE MIDDLEWARE

Globus [security, information, job submission]

FABRIC

Worldwide Grid

[Distributed computers and databases with different Arch, OS, and local resource management systems]


V lab components interaction

Grid Info Nimrod-G Tools

Server

Nimrod-G Grid Broker

Task

Farming

Engine

Grid

Scheduler

Grid Trade

Server

Grid Tools

And Applications

User

Process

Do this in 30 min. for $10?

Nimrod

Agent

Local

Resource

Manager

ProcessServer

Grid

Dispatcher

Docking

Process

Get molecule “n” record from “abc” CDB

CDB

Server

File

Server

File access

Molecule “n”

Location ?

CDB Client

Get mol. record

.

.

.

.

.

.

.

.

.

.

.

.

Index and CDB1

CDBm

CDB Service on Grid

V-Lab Components Interaction

User Node

Grid Node

Compute Node


Dock code enhanced by wehi u of melbourne
DOCK code* Nimrod-G Tools (Enhanced by WEHI, U of Melbourne)

  • A program to evaluate the chemical and geometric complementarities between a small molecule and a macromolecular binding site.

  • It explores ways in which two molecules, such as a drug and an enzyme or protein receptor, might fit together.

  • Compounds which dock to each other well, like pieces of a three-dimensional jigsaw puzzle, have the potential to bind.

  • So, why is it important to able to identify small molecules which may bind to a target macromolecule?

  • A compound which binds to a biological macromolecule may inhibit its function, and thus act as a drug.

  • E.g., disabling the ability of (HIV) virus attaching itself to molecule/protein!

  • With system specific code changed, we have been able to compile it for Sun-Solaris, PC Linux, SGI IRIX, Compaq Alpha/OSF1

* Original Code: University of California, San Francisco: http://www.cmpharm.ucsf.edu/kuntz/


Dock input file

Molecule to be screened Nimrod-G Tools

Dock input file

score_ligand yes

minimize_ligand yes

multiple_ligands no

random_seed 7

anchor_search no

torsion_drive yes

clash_overlap 0.5

conformation_cutoff_factor 3

torsion_minimize yes

match_receptor_sites no

random_search yes

. . . . . .

. . . . . .

maximum_cycles 1

ligand_atom_file S_1.mol2

receptor_site_file ece.sph

score_grid_prefix ece

vdw_definition_file parameter/vdw.defn

chemical_definition_file parameter/chem.defn

chemical_score_file parameter/chem_score.tbl

flex_definition_file parameter/flex.defn

flex_drive_file parameter/flex_drive.tbl

ligand_contact_file dock_cnt.mol2

ligand_chemical_file dock_chm.mol2

ligand_energy_file dock_nrg.mol2


1 parameterize dock input file use nimrod tools gui language

Molecule to be screened Nimrod-G Tools

1. Parameterize Dock input file(use Nimrod Tools: GUI/language)

score_ligand $score_ligand

minimize_ligand $minimize_ligand

multiple_ligands $multiple_ligands

random_seed $random_seed

anchor_search $anchor_search

torsion_drive $torsion_drive

clash_overlap $clash_overlap

conformation_cutoff_factor $conformation_cutoff_factor

torsion_minimize $torsion_minimize

match_receptor_sites $match_receptor_sites

random_search $random_search

. . . . . .

. . . . . .

maximum_cycles $maximum_cycles

ligand_atom_file ${ligand_number}.mol2

receptor_site_file $HOME/dock_inputs/${receptor_site_file}

score_grid_prefix $HOME/dock_inputs/${score_grid_prefix}

vdw_definition_file vdw.defn

chemical_definition_file chem.defn

chemical_score_file chem_score.tbl

flex_definition_file flex.defn

flex_drive_file flex_drive.tbl

ligand_contact_file dock_cnt.mol2

ligand_chemical_file dock_chm.mol2

ligand_energy_file dock_nrg.mol2


2 create docking plan define variable and their value
2. Create Docking Plan: Nimrod-G Tools Define Variable and their value

parameter database_name label "database_name" text select oneof "aldrich" "maybridge" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_bioscreen_s" "inter_bioscreen_n" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300";

parameter CDB_SERVER text default "bezek.dstc.monash.edu.au";

parameter CDB_PORT_NO text default "5001";

parameter score_ligand text default "yes";

parameter minimize_ligand text default "yes";

parameter multiple_ligands text default "no";

parameter random_seed integer default 7;

parameter anchor_search text default "no";

parameter torsion_drive text default "yes";

parameter clash_overlap float default 0.5;

parameter conformation_cutoff_factor integer default 5;

parameter torsion_minimize text default "yes";

parameter match_receptor_sites text default "no";

. . . . . .

. . . . . .

parameter maximum_cycles integer default 1;

parameter receptor_site_file text default "ece.sph";

parameter score_grid_prefix text default "ece";

parameter ligand_number integer range from 1 to 2000 step 1;

Molecules to be screened


Create docking planfile 3 define task that jobs need to do
Create Docking PlanFile Nimrod-G Tools 3. Define Task that jobs need to do

task nodestart

copy ./parameter/vdw.defn node:.

copy ./parameter/chem.defn node:.

copy ./parameter/chem_score.tbl node:.

copy ./parameter/flex.defn node:.

copy ./parameter/flex_drive.tbl node:.

copy ./dock_inputs/get_molecule node:.

copy ./dock_inputs/dock_base node:.

endtask

task main

node:substitute dock_base dock_run

node:substitute get_molecule get_molecule_fetch

node:execute sh ./get_molecule_fetch

node:execute $HOME/bin/dock.$OS -i dock_run -o dock_out

copy node:dock_out ./results/dock_out.$jobname

copy node:dock_cnt.mol2 ./results/dock_cnt.mol2.$jobname

copy node:dock_chm.mol2 ./results/dock_chm.mol2.$jobname

copy node:dock_nrg.mol2 ./results/dock_nrg.mol2.$jobname

endtask



Chemical database cdb
Chemical DataBase (CDB) (e.g., Docking)

  • Databases consist of small molecules from commercially available organic synthesis libraries, and natural product databases.

  • There is also the ability to screen virtual combinatorial databases, in their entirety.

  • This methodology allows only the required compounds to be subjected to physical screening and/or synthesis reducing both time and expense.


Target testcase
Target Testcase (e.g., Docking)

  • The target for the test case: electrocardiogram (ECE) endothelin converting enzyme. This is involved in “heart stroke” and other transient ischemia.

  • Is·che·mi·a : A decrease in the blood supply to a bodily organ, tissue, or part caused by constriction or obstruction of the blood vessels.



Scheduling molecular docking application on grid experiment
Scheduling Molecular Docking Application on Grid: Experiment (e.g., Docking)

  • Workload – Docking 200 molecules with ECE

    • 200 jobs, each need in the order of 3 minute depending on molecule weight.

  • Deadline: 60 min. and budget: 50, 000 G$/tokens

  • Strategy: minimise time / cost

  • Execution Cost with cost optimisation

    • Optimise Cost: 14, 277(G$) (finished in 59.30 min.)

    • Optimise Time: 17, 702(G$) (finished in 34 min.)

    • In this experiment: Time-optimised scheduling costs extra 3.5K$ compared to that of Cost-optimised.

    • Users can now trade-off between Time Vs. Cost.


Wwg setup

WW Grid (e.g., Docking)

WWG Setup

Australia

North America

GMonitor

Melbourne+Monash U:

VPAC, Physics

ANL: SGI/Sun/SP2

NCSA: Cluster

Wisc: PC/cluster

NRC, Canada

Many others

Gridbus+Nimrod-G

MEG Visualisation

Solaris WS

Internet

Europe

Grid MarketDirectory

ZIB: T3E/Onyx

AEI: Onyx

CNR: Cluster

CUNI/CZ: Onyx

Pozman: SGI/SP2

Vrije U: Cluster

Cardiff: Sun E6500

Portsmouth: Linux PC

Manchester: O3K

Cambridge: SGI

Many others

Asia

AIST, Japan: Solaris Cluster

Osaka University: Cluster

Doshia: Linux cluster

Korea: Linux cluster



Dbc scheduling for time optimization no of jobs in exec
DBC Scheduling for Time Optimization (e.g., Docking)– No. of Jobs in Exec.


Dbc scheduling for cost optimization no of jobs in exec
DBC Scheduling for Cost Optimization (e.g., Docking)– No. of Jobs in Exec.


Summary and conclusion
Summary and Conclusion (e.g., Docking)

  • Applications can be Grid enabled and deployed on the Grid with minimal effort, but need a right set of Grid tools.

  • Distributed Docking demonstrates that Nimrod-G and Gridbus tools:

    • Enable Grid application software engineering rapidly

    • Provide powerful runtime machinery for optimal deployment of applications on the Grid.

  • Easy to use tools for composing applications to run on Grid are essential to attracting and getting application community on board.

  • Integrate with our Data Grid Broker to support selection of CDB nodes dynamically. (progress)


Thanks
Thanks (e.g., Docking)

http:/www.gridbus.org/vlab


Dbc time opt scheduling
DBC Time Opt. Scheduling (e.g., Docking)


Dbc scheduling for time optimization no of jobs finished
DBC Scheduling for Time Optimization (e.g., Docking)– No. of Jobs Finished


Dbc scheduling for time optimization budget spent
DBC Scheduling for Time Optimization (e.g., Docking)– Budget Spent


Dbc cost opt scheduling
DBC Cost Opt. Scheduling (e.g., Docking)


Dbc scheduling for cost optimization no of jobs finished
DBC Scheduling for Cost Optimization (e.g., Docking)– No. of Jobs Finished


Dbc scheduling for cost optimization budget spent
DBC Scheduling for Cost Optimization (e.g., Docking)– Budget Spent


Parametric processing
Parametric Processing (e.g., Docking)

Parameters

Magic Engine for

Manufacturing Humans!

Multiple Runs

Same Program

Multiple Data

Killer Application for the Grid!

Courtesy: Anand Natrajan, University of Virginia


ad