grids and biology
Download
Skip this Video
Download Presentation
Grids and Biology

Loading in 2 Seconds...

play fullscreen
1 / 47

Grids and Biology - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

Grids and Biology. Professor Carole Goble University of Manchester, UK BBSRC Bioinformatics and eScience Grant Holders Workshop, Warwick, UK 28 th October 2002. Grids and Biology. A take on the Grid Issues in Bioinformatics for Grid Various BioGrids Applicability of Grid to Biology

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Grids and Biology' - kamil


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
grids and biology

Grids and Biology

Professor Carole Goble

University of Manchester, UK

BBSRC Bioinformatics and eScience Grant Holders Workshop, Warwick, UK

28th October 2002

grids and biology1

Grids and Biology

A take on the Grid

Issues in Bioinformatics for Grid

Various BioGrids

Applicability of Grid to Biology

Reality check

what is the grid
What is the Grid?

“ Grid computing [is] distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation...we review the "Grid problem", which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources - what we refer to as virtual organizations."

From "The Anatomy of the Grid: Enabling Scalable Virtual Organizations" by Foster, Kesselman and Tuecke

what is the grid1
What is the Grid?
  • Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations
  • On-demand, ubiquitous access to computing, data, and services
  • New capabilities constructed dynamically and transparently from distributed services
  • No central location, No central control, No existing trust relationships, Little predetermination
  • Uniformityfor Pooling Resources
  • Virtual pools of resources: databases, clusters….
biology as a grid application
Biology as a Grid Application
  • Informational Science
  • Large Scale
  • Distributed
  • No one organisation owns it all
motivation

ESTs

Motivation

Metabolic Pathways

Pharmacogenomics

Human Genome

Combinatorial

Chemistry

Computational

Load

Genome Data

Moores Law

1990

2000

2010

biomedical computation
BioMedical Computation

[Rick Stevens, Argonne Labs]

biomedical data high complexity and large scale

Proteins

sequence

2º structure

3º structure

DNA sequences

alignments

Biomedical Data: High Complexity and Large Scale

[Rick Stevens, Argonne Labs]

billions

Protein-Protein

Interactions

metabolism

pathways

receptor-ligand

4º structure

Physiology

Cellular biology

Biochemistry

Neurobiology

Endocrinology

etc.

Polymorphism

and Variants

genetic variants

individual patients

epidemiology

millions

millions

Hundredthousands

ESTs

Expression patterns

Large-scale screens

Genetics and Maps

Linkage

Cytogenetic

Clone-based

MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYT...

billions

...atcgaattccaggcgtcacattctcaattcca...

millions

biogrid projects

myGrid

BioGrid Projects
  • EUROGRID BioGRID
  • Asia Pacific BioGRID
  • North Carolina BioGrid
  • Bioinformatics Research Network
  • Osaka University BioGrid
  • Indiana University BioArchive BioGrid
  • myGrid
  • BioSim
  • e-Protein
  • ObiGrid
today s grid
A Single System Image

Transparent wide-area access to large data banks

Transparent wide-area access to applications on heterogeneous platforms

Transparent wide-area access to processing resources

Security, certification, single sign-on authentication, AAA

Grid Security Infrastructure,

Data access,Transfer & Replication

GridFTP, Giggle

Computational resource discovery, allocation and process creation

GRAAM, Unicore, Condor-G

Today’s Grid
immediate benefits
Immediate benefits
  • Uniform file views of directories, regardless of platform
  • Grid-based data transfer libraries for faster access to large files, reducing need for mirror-site servers.
  • Replication to support mirroring
  • Grid APIs provide a job manager with metadata about services to the user. Evaluate the quality of service providers based on factors that may include more than just server performance and availability.
  • Grid-aware applications -- split sequence reference libraries among several servers, where BLAST comparisons can be conducted in parallel.
  • Shielding from a variety of low-level computing problems would otherwise have to address themselves.
grid landscape
Grid Landscape

Computationally Intensive

Collaborative

Visualisation

Data Intensive

Knowledge Intensive

grid landscape1
Grid Landscape

Computationally Intensive

Collaborative

Visualisation

Data Intensive

Knowledge Intensive

classical grids
Classical Grids emphasise sharing of physical resources.

Existing Grid middleware (e.g. Globus, Condor, Unicore) allows resource discovery, resource allocation, data movement, certification …

Classical Grids
high performance bioinformatics software
High Performance Bioinformatics Software

[Jack da Silva, NCSC, Paracel]

slide18
Access portal for biomolecular modeling resources.
  • Interfaces to enable chemists and biologists to be able to submit work to HPC facilities
  • Visualization of electrostatic field generated by a molecule.

dr Krzysztof Nowinski (ICM)

biogrid system
Biogrid system

SCORE

Management Station

SCORE

Management Station

Myrinet-2000

Connected to

Grid system3

Grid system 1

Express5800/ISS for PC-Cluster

Xeon2.2G x 8 + Management node1

Flat Neighborhood networks

1000Base-SX

Grid system 2

NEC Blade Server78node(156CPU)

1000Base-T x 12

Data Grid Disk

Express5800/140Ra-4 x3

remote control of instruments

(Chicago)

STAR TAP

(UC San Diego)

SDSC

Osaka University

Tokyo XP

TransPACAPAN

vBNS

JGN

UHVEM

(Osaka, Japan)

NCMIR

(San Diego)

Remote control of instruments
  • Sharing of UHVEM(Ultra High Voltage Electron Microscopy) in Osaka University with NCMIR (National Center for Microscopy and Imaging Research)
    • 3 Million electron volts
    • the most powerful microscopy
home computers evaluate aids drugs
Home ComputersEvaluate AIDS Drugs
  • Community =
    • 1000s of home computer users
    • Philanthropic computing vendor (Entropia)
    • Research group (Scripps)
  • Common goal= advance AIDS research

From Steve Tuecke 12 Oct. 01

matlab
Matlab

Geodise releasein November 02

[email protected]

  • Matlab and toolboxes for mathematical computation, analysis, visualization, and algorithm development:

MATLAB is an intuitive language and a technical computing environment. It provides core mathematics and advanced graphical tools for data analysis, visualization, and algorithm and application development. With more than 600 mathematical, statistical, and engineering functions, engineers and scientists rely on the MATLAB environment for their technical computing needs.”

(www.mathworks.com)

CROSS PLATFORM/ OS

slide23

BioSim -- Molecular simulations as a tool for protein structure analysis

[Sansom]

synchrotron

compute GRID

MD database

novel biology…

  • Overall vision – simulation as an integral component of structural genomics
  • Needs both capacity (many systems) and capability (large systems - HPCx)
  • Molecular Dynamics database (distributed)
grid landscape2
Grid Landscape

Computationally Intensive

Collaborative

Visualisation

Data Intensive

Knowledge Intensive

visualization bioinformatics

[Rick Stevens Argonne Labs]

Visualization + Bioinformatics

Visualization

Environment

Bioinformatic

Analysis Tools

Microbiology &

Biochemistry

Genome Visualization Tools

Function Assignment

Whole Genome Analysis

Metabolic Reconstruction

Enzymatic Constants

Metabolic ***

Network Visualization Tools

Stoichiometric Representation

& Flux Analysis

Proteomics

Interactive Stoichiometric

Graphical Tools

Dynamic Simulation

Whole Cell Visualizations

Image/Spectra Augmentations

Laboratory Verification

x ray microtomography
X-ray microtomography
  • Scientific discovery can be enhanced by closely coupling computation and experiment. Simulation, visualization and data gathering coupled
  • X-ray microtomography produces 3D X-ray attenuation maps of specimens at a microscopic level
  • Expensive synchrotron beam time resources optimally used to obtain sufficient resolution for simulation
interactive steering
Interactive Steering
  • User steers calculation from laptop
  • Controlled steering on supercomputers
  • Visualization and computation use large scale machines accessed via Grid.

Enables controlled simulation using knowledge and skills of trained scientist.

scalable molecular dynamics
Scalable molecular dynamics
  • Structure of a protein in a fluid medium
  • Calculation takes into account forces between protein and ambient medium (in this case water molecules)
  • Run on world largest academic computer, LeMieux at PSC (6 Tflops theoretical peak)
grid landscape3
Grid Landscape

Computationally Intensive

Collaborative

Visualisation

Data Intensive

Knowledge Intensive

slide30

UCSF

UIUC

From Klaus Schulten, Center for Biomollecular Modeling and Bioinformatics, Urbana-Champaign

grid landscape data
Grid Landscape: DATA!!

Computationally Intensive

Collaborative

Visualisation

Data Intensive

Knowledge Intensive

information weaving and question answering
Information Weaving and Question Answering
  • Large amounts of different kinds of data & many applications.
  • Highly heterogeneous.
    • Different types, algorithms, forms, implementations, communities, service providers
  • High autonomy.
  • Highly complex and inter-related, & volatile.
slide34

proteome sequences

sequences

SCOP

CATH

PDB

NRPROT

INTERPRO

TM, CC, LC, SIG & MOTIFS

PSIBLAST & HHMs

PDB hit

noPDB hit

3D modelling x 2

fold recognition x 2

structure-based function prediction

structural and functional annotation

[Mike Sternberg]

Annotation Pipeline

mygrid
myGrid

RASMOL

  • Personalised extensible environments for data-intensivein silico experiments in biology
  • Straightforward discovery, interoperation, deployment & sharing of services
    • Service-oriented architecture
  • Integration and Information
    • Workflow & Databases
  • Experimentation
    • Provenance, propagating change, personalisation

For bioinformaticians who are building tools and using or providing services

discoverynet
DiscoveryNet
  • Bio Chip Applications

Protein-folding chips: SNP chips, Diff. Gene chips using LFII

Protein-based fluorescent micro arrays

1-1000

10-1000

>10000

Data Quality

Visualisation

Structuring

Clustering

Distributed

Dynamic

Knowledge

Management

http://www.discovery-on-the.net/

High Throughput Sensing (HTS) Applications

Large-scale Dynamic Real- time Decision support

Large-scale Dynamic System Knowledge Discovery

Based on Kensington

Discovery Platform

Grid-based Knowledge Discovery

Grid-based Data Mining, Collaborative Visualisation

Information Structuring

Information Integration & Composition,

Semantics & Domain-based Ontologies, Sharing

Distributed Data Engineering

Data Registration, Data Normalisation, Data Quality

Based on Globus & ORB Infrastructure

High Throughput Computing Services

Utilising Grid Infrastructure for HT Computing

Grid Basic Infrastructure

Globus/Condor/SRB

grid evolution
Grid Evolution
  • 1st Generation Grid
    • Computationally intensive, file access/transfer
    • Bag of various heterogeneous protocols & toolkits
    • Recognises internet, Ignores Web
    • Academic teams
  • 2nd Generation Grid
    • Data intensive -> knowledge intensive
    • Services-based architecture
    • Recognises Web and Web services
    • Global Grid Forum
    • Industry participation

We are here!

slide38
A Grid of resources, not just compute resources but databases, digital libraries, instruments, workflows, documents …

A Grid vs The Grid

NovartisGrid

BioSimGrid

MouseGrid

Logical

Grid Middleware

These configurations are dynamic

Resources discovered, combined, used and disbanded as and when needed or available.

Gigabit IP Network

Physical

Node

Node

Node

Geographically

(e.g. UKGrid)

Node

a configuration of resources
A configuration of resources

services

  • Not just compute services but databases, digital libraries, instruments, workflows, documents …

Open Grid Service Architecture

OGSA

Grid Services

Web Services

Grid Technology

bio services
Bio Services
  • Drug Discovery
  • Microbial Engineering
  • Molecular Ecology
  • Oncology Research

Domain Oriented Services

  • Integrated Databases
  • Sequence Analysis
  • Protein Interactions
  • Cell Simulation

Basic BioGrid Services

Grid Resource Services

  • Compute Services
  • Pipeline Services
  • Data Archive Service
  • Database Hosting
  • Workflow Enactment
  • Event notification

Common Services

Base Services

Fabric Services

what we need to create
What We Need to Create
  • Grid Bio applications enablement software layer
    • Provide application’s access to Grid services
    • Provides OS independent services
  • Grid enabled version of bioinformatics data management tools (e.g. DL, SRS, etc.)
    • Need to support virtual databases via Grid services
    • Grid support for commercial databases
  • Bioinformatics applications “plug-in” modules
    • End user tools for a variety of domains
    • Support major existing Bio IT platforms
requirements for the biogrid
Requirements for the BioGrid
  • Open and extendable architecture
    • Enable tie in to service stack at appropriate points
    • Not just access via Portals
  • Leverage scripting tools in wide use for Bioinformatics
    • Create BioGrid services bindings for PERL and Python
  • Address data federation and integration
    • Leverage work of IBM, Lion BioSciences, DAS, BioMOBY, etc.
  • Match the biology workflow and tool chain
    • Create high-level BioGrid services to address critical stages in existing workflow
    • Support composibility of new BioGrid tools with existing tool chain elements
some biogrid challenges
Some BioGrid Challenges
  • Scalable human bioinformatics expertise
    • Best people working on the important problems
    • Exploit collaboration technology to create world class teams
  • Robust local bioinformatics computing environment
    • Best systems administrators and high-end technologies
    • Embed local resources into the Grid via portal technologies
  • Access to leading edge bioinformatics software and databases customized to user needs
    • Core content from top scientists and developers
    • Integrated access to biological databases
  • Worldwide access to robust computing and database infrastructure
    • Leverage Grid technology to provide worldwide access
    • Integrate purpose built systems and service providers
reality checks
Reality Checks!!
  • The Technology is Ready
    • Not true — its emerging
      • Building middleware, Advancing Standards, Developing, Dependability
      • Building demonstrators.
      • The computational grid is in advance of the data intensive middleware
      • Integration and curation are probably the obstacles
      • But!! It doesn’t have to be all there to be useful.
  • We know how we will use grid services
    • No — Disruptive technology
      • Lower the barriers of entry.
reality checks1
Reality Checks!!
  • It’s the only game
    • Not true — I3C, BioMOBY, bioDAS, OMG LSR
      • Grid and Web service merge makes integration likely.
  • One Size Fits All
    • Not true
      • Addressed by a minimum set of composable virtual services, But starting with Globus
  • It’s only for “big” science
    • No — “small” science collaborates too!
  • Biology is not unique!
    • AstroGrid
not a silver bullet
Not a silver bullet!

Its just middleware not magic

  • Data quality
  • Content management of databases (controlled vocabularies)
  • Provenance and versioning policies
  • Appropriate use of tools
  • Computational inaccessibility of free text annotation
  • Database accessibility through means other than point and click web interfaces.

Independent of the Grid!

life sciences grid lsg
Life Sciences Grid (LSG)

http://people.cs.uchicago.edu/~dangulo/LSG/

ad