Cloud computing for chemical property prediction
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

Cloud Computing for Chemical Property Prediction PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on
  • Presentation posted in: General

Microsoft Cloud Futures Conference, Redmond, 8 th April 2010. Cloud Computing for Chemical Property Prediction. Paul Watson School of Computing Science Newcastle University, UK [email protected] The team: David Leahy, Jacek Cala, Hugo Hiden,

Download Presentation

Cloud Computing for Chemical Property Prediction

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cloud computing for chemical property prediction

Microsoft Cloud Futures Conference, Redmond, 8th April 2010

Cloud Computing for Chemical Property Prediction

Paul Watson

School of Computing Science

Newcastle University, UK

[email protected]

  • The team: David Leahy, Jacek Cala, Hugo Hiden,

  • Dominic Searson, Vladimir Sykora, Martyn Taylor, Simon Woodman

  • With thanks to:

  • Microsoft External Research for their financial support for Project Junior

  • Christophe Poulain, Savas Parastatidis


Chemists want to know

Chemists want to know:

Q1. What are the properties of this molecule?

Toxicity

Biological Activity

Solubility

Q2. What molecule would have aqueous solubility of 0.1 μg/mL?


Answering the question by performing experiments

Answering the Question by performing experiments

..... time consuming, expensive, ethical Issues


An alternative to experimentation qsar

An alternative to experimentation: QSAR

Quantitative Structure Activity Relationship

- predict properties based on similar molecules

Activity≈ f( )

quantifiablestructural attributes, e.g.

#atoms

logp

shape

.....


Generating the models discovery bus leahy et al

Generating the models -Discovery Bus (Leahy et al)

Data

Model-Builders

Models

www.openqsar.com

New Data

or

Model-Builders

Model Generation

New/

Improved

Models


Cloud computing for chemical property prediction

Chemical Structures & their Activities

Separate Training & Test Data

Test Data

Training Data

Calculate Descriptors from Structures

Descriptors + Responses

Combine Descriptors

Selected Descriptors + Responses

Combined Descriptors + Responses

Filter Descriptors

Multiple

Linear Regression

Neural Network

Partial Least Squares

Classification Trees

Build &

Test Models

Independently

.....

Select Best Models

Add to Model Database


Increasing amounts of data for model building

Increasing amounts of data for model building...

CHEMBL :

data on 622,824 compounds,

collected from 33,956 publications

WOMBAT :

data on251,560 structures,

for over 1,966targets

WOMBAT-PK:

data on 1230 compounds,

for over 13,000 clinical measurements

All contain structure information & numerical activity data

 More models

 Better models

  •  Computationally expensive:

    • 5 years for new datasets on existing server


Junior project aim

JUNIOR Project Aim

Use Azure to generate models in weeks not years

.... using as much of the available data as possible

.... make models available on www.openqsar.com

... so that researchers can generate predictions for their own molecules


Cloud computing for chemical property prediction

Potential for concurrency...

Chemical Structures & their Activities

Separate Training & Test Data

Test Data

Training Data

Calculate Descriptors from Structures

Descriptors + Responses

Combine Descriptors

Combined Descriptors + Responses

Filter Descriptors

Selected Descriptors + Responses

Multiple

Linear Regression

Neural Network

Partial Least Squares

Classification Trees

Build &

Test Models

Independently

.....

Select Best Models

Add to Model Database


Approach

Approach

  • avoid rewriting all existing Discovery Bus software

  • move existing Discovery Bus to Amazon Cloud

    • without parallelisation

  • move critical tasks to run concurrently on Azure

  • base solution around e-Science Central ....


Clouds to the rescue

Clouds to the rescue?

  • Building scalable, dependable, science applications is still hard .....

  • e-Science Central

    • Science Cloud Platform


Science cloud options

Science Cloud Options

Science

App 1

Science

App n

Users

Users

....

Science Cloud Platform

Science

App 1

Science

App n

....

Cloud Infrastructure:

Storage & Compute

Cloud Infrastructure: Storage & Compute


Cloud computing for chemical property prediction

e-Science Central

Science as a Service

for users

Science

App 1

Science

App n

Users

....

Science Cloud Platform

for developers

Science Cloud Platform

Cloud Infrastructure: Storage & Compute


What should the science cloud platform include

What should the Science Cloud Platform Include?

Identify the common needs of our e-Science Users

Data(instruments, experimental data, sensors...)


Cloud computing for chemical property prediction

....

App

App

Analysis Services

e-Science Central

App API

Security

Social Networking

Science

Cloud

Platform

Provenance

Workflow Enactment

Metadata

Processing

Cloud

Infrastructure

Storage


Cloud computing for chemical property prediction

Discovery Bus

Planner

Amazon

Analysis Services

e-Science Central

App API

Security

Social Networking

Provenance

Workflow Enactment

Metadata

Processing

Azure

Storage


Cloud computing for chemical property prediction

2

Workflow decomposed to Message Plan

1

Discovery Bus invokes e-Science Central Workflow via API

Temporary workflow storage assigned, Message Plan queued

for execution.

3

4

Message Plan

Call Message

Internal Service

RMI / JMS

NFS

Response Message

Workflow temporary storage

Messages sent in sequence

Call Message

Azure Service

HTTP

HTTP Post

Response Message

5

5

Workflow Execution

Completes

Discovery Bus notified with results

Results data stored in e-Science Central folder


Cloud computing for chemical property prediction

e-Science Central

Blob Storage

  • Web Node

  • Worker Node

  • Worker Node

  • Worker Node

Results

Queue

Azure


Current status work in progress

Current Status – Work in Progress

  • Running across up to 100 Azure nodes

  • Azure utilisation increasing (average ~60% over runs)

  • Moving more admin tasks to Azure / e-Science Central

    • model validation

    • co-ordination


Cpu utilization

CPU Utilization


Summary

Summary

  • Discovery Bus exemplifies a good Cloud pattern

    • large, variable, bursty requirements

    • proposal to apply to software verification

  • clouds do NOT make it easier to build complex, scalable, dependable distributed systems

    • we need higher-level “Science Cloud Platforms”

    • e-Science Central is our attempt at this

  • using the Azure Cloud we have a scalable system that can handle large new datasets

  • the models are being made freely available

    • www.openqsar.com


  • Login