Everything you always wanted to know about the Grid and never dared to ask

Everything you always wanted to know about the Grid and never dared to ask Tony Hey and Geoffrey Fox

Outline • Lecture 1: Origins of the Grid – The Past to the Present (TH) • Lecture 2: Web Services, Globus, OGSA and the Architecture of the Grid (GF) • Lecture 3: Data Grids, Computing Grids and P2P Grids (GF) • Lecture 4: Grid Functionalities – Metadata, Workflow and Portals (GF) • Lecture 5: The Future of the Grid - e-Science to e-Business (TH)

Lecture 1 Origins of the Grid – The Past to the Present [Grid Computing Book: Chs 1,2,3,4,5,6,36]

Lecture 5 The Future of the Grid – e-Science to e-Business [Grid Computing Book: Chs 38, 39, 40,41,42,43]

Lecture 5 • e-Science Research and the Future of Scientific Research • Computer Science Research Issues • A Business Case for the Grid • Concluding Remarks

e-Scienceand the Future of Scientific Research ‘e-Science will change the dynamic ofthe way science is undertaken.’ John Taylor, 2001

“Problem Solving Environments”Domain-specific application interfaces for scientists Computing Grid service Data discovery Grid service Data visualisation Grid service Experiment control Grid service Grid services middleware AuthenticationAuthorisationAccounting Computers Data storage Experiments local remote local remote local remote Integrated e-Science Environment Framework for distributed scientific computing and experimentation

e-Science Examples • Particle Physics • Virtual Observatories • e-Engineering • e-Chemistry • Bioinformatics • High-Throughput Applications • e-Health

DAME Project In flight data Global Network eg: SITA Ground Station Airline DS&S Engine Health Center Maintenance Centre Internet, e-mail, pager Data centre

Comb-e-Chem Structure + Properties Knowledge + Prediction Structures DB Properties DB Simulation and calculation

Combinatorial Chemistry • Parallel synthetic approach • create hundreds of materials • screen properties to find those that fit the bill • Typically requires several passes • find chemical structure of the best candidates • create new batches of similar materials for subsequent passes • Leads to explosive growth in: • volume of data generated • potential to exploit this data

MeOH EtOH PrOH BuOH R1COOH R2COOH Monitor & Analysis Data R3COOH R4COOH Interface to Grid same reaction sequencefor all combinations Array production of different chemical species

Well plate with typically 96 or 384 cells Library synthesis Mass Spec Raman databases Structure and properties analysis x-ray High throughput systems • 100,000’s compounds at a time analysis • Produces huge amounts of complex data

Users Users Users Experiment Experiment Expert Remote equipment, multiple users, few experts Data & control links Access Grid links Remote (Dark) Laboratory • Model for National crystallographic Service NCS

NCS Workflow Send sample material to NCS service Collaborate in e-Lab experiment and obtain structure Search materials database and predict properties using Grid computations Download full data on materials of interest

NCS Portal Access

NCS Experimental Services

NCS Lab Service

myGrid Project • Imminent ‘deluge’ of data • Highly heterogeneous • Highly complex and inter-related • Convergence of data and literature archives

myGrid: Generic Technologies • Database access from the Grid • Process enactment on the Grid • Personalisation services • Metadata services • Development of Agent Services • Ultimate goal is to put Grid Services together withOntologies to develop ‘Semantic Grid’

Workflow • Know how. • Associate base resources with derived data. • Keep, describe, find, compare, protect, share. • Repeat/reuse/re-enact • Specialise/Customise/Personalise • Evolution – notification, knowledge • Quality & best practice • Need the workflows to be effective • good experimental practice. 1 2 3 4

Personalisation • Dynamic creation of personal data sets • Personal views over repositories • Personalisation of workflows • Personal notification • Annotation of datasets and workflows • Personalisation of service descriptions – ‘what I think the service does’ 1 2 3 4

Provenance • Who, what, where, why, when, how? • The traceability of knowledge as it is evolves and as it is derived. • Identity – the Life Sciences ID • Lab Books, Methods in papers. • Immutable Metadata • Migration – travels with its data but may not be stored with it. • Private vs Shared provenance records. • Ownership/credit 1 2 3 4

Discovery Net Project In Real Time Scientific Information Scientific Discovery Real Time Integration Workflow Construction Operational Data Literature Instrument Data Databases Interactive Visual Analysis Dynamic Application Integration Using Distributed Resources Images

Discovery Process Management • Workflow = Service Composition + Discovery Pathway • Towards a Standard Workflow Representation for Discovery Informatics: Discovery Process Markup Language (DPML): • Discovery Pathway Construction: Recording and managing a collaboratively-built discovery process • Distributed Service Composition: Components organsied by the workflow can be executing anywhere • Discovery Pathway as Key Intellectual Property:Discovery Processes can be stored, reused, audited, refined and deployed in various forms D-Net Workflow for Genome Annotation : 16 services executing across Internet

Dynamic Integration Services • Dynamic Application Integration = On-demand access and composition of remote analysis components • Towards a Dynamic Component Integration: • Knowledge Servers: allow users to register, locate and remotely execute components • Execution Servers: allow users to control the execution of components distributed environments • Easy Maintenance:New components can be added through a clean API Clustering Classification Text analysis Gene function perdition D-NET API Promoter Prediction Homology Search

Identify Organism Chromosomes Organism’s DNA Identify Genes tRNAs, rRNAs Gene markers Non-translated RNAs EMBL NCBI genscan blast Regulatory Regions Repetitive Elements TIGR SNP grail Repeat Masker Segmental Duplication SNP Variations Literature References ….. E-PCR genscan Identify Proteins Classify into Protein Families Inter Pro Inter Pro blast 3D-PSSM Functional Characteisation Homologues SMART SWISS PROT Domain 3-D Structure PFAM Motif Search Fold Prediction Secondary structure predator DSC Literature References ….. Relate Pathway Maps Ontologies Cell Cycle Metabolism GO CSNDB Drugs Biological Process….. AmiGO GeneMaps KEGG GK Cell death Embryogenesis virtual chip GenNav Literature References ….. Case Study: SC2002 HPC Challenge D-Net based Global Collaborative Real- Time Genome Annotation High Throughput Sequencers Nucleotide-level Annotation Genome Annotation Protein-level Annotation Process-level Annotation 15 DBs 21 Applications

KEGG Inter Pro SMART Execute distributed annotation workflow SWISS PROT EMBL NCBI TIGR SNP GO Interactive Editor & Visualisation How It Works Nucleotide Annotation Workflows Download sequence from Reference Server Save to Distributed Annotation Server • 500 Web access • 1800 clicks • 200 copy/paste • 3 weeks work in 1 workflow and few second execution

eDiamond Applications of SMF Teleradiology and QC VirtualMammo Training and Differential Diagnosis “Find one like it” ? Advanced CAD SMF-CAD workstation Epidemiology SMFcomputed breast density

Image guided interventions Images Courtesy Derek Hill Guy’s Hospital

Image guided interventions (2) Images Courtesy Guy’s Hospital

Surgical verificationAccuracy of surgical placement against plan • Surgeon plans on X-ray or CT, uses database of prostheses • Operation takes place using plan as guidance • Post operative X-ray evaluated for accuracy of placement • Data stored and used for short term assessment and long term evaluation studies Courtesy of Ian Revie Depuy International

Summary • UK e-Science projects emphasize data federation and integration as much as computation • Metadata and ontologies key to higher level Grid services • e-Science projects will produce a deluge of scientific data that will need to be annotated and curated in scientific data ‘digital libraries’

Databases in the Grid Semantic Web Data Complexity Classical Web Classical Grid Computational Complexity

OGSA – DAI Project • Key middleware project for UK Program - Total Budget £3M (CP £1.5M) • Three Centres involved: - Edinburgh, Manchester and Newcastle • Industrial partners: - IBM US, IBM Hursley and Oracle UK • Goal is to develop high-quality data-centric middleware

OGSA – DAI Project • Design Specification completed • Papers for GGF WG on Database Access and Integration Services • Alpha versions delivered: • Distributed Query Service • XML Database Interface • Relational Database Interface • Beta versions by April 2003 • Integrate with Globus GT3 release

e-Scienceand the Future of Scientific Research ‘e-Science will change the dynamic ofthe way science is undertaken.’ John Taylor, 2001 • Need to break down the barriers between the Victorian ‘bastions’ of science – biology, chemistry, physics, …. • Develop ‘permeable’ structures that promote rather than hinder multidisciplinary collaboration • Engage Computing Services and Libraries in developing a new e-Science support service on Campus

e-Science and Computer Science • The lesson of the Web • The Semantic Grid • The myGrid project • The Discovery Net Project • Computer Science Research and the Grid

Error 404: Page not found ‘If you want the Web to scale, You must allow the links to fail’ Wendy Hall after Tim Berners-Lee • HTML as the ‘Fortran’ of Hypertext!

Semantic Web

Metadata & Ontologies • Metadata – computationally accessible data about the services • Ontologies – the shared and common understanding of a domain • A vocabulary of terms • Definition of what those terms mean. • A shared understanding for people and machines • Usually organised into a taxonomy.

Reasoning in DAML+OIL • Consistency— check if knowledge is meaningful • Subsumption— structure knowledge, compute classification • Equivalence— check if two classes denote same set of instances • Instantiation— check if individual instance of class C • Retrieval— retrieve set of individuals that instantiate C

Computer Science Challengesfrom e-Science UK CS Team led by Tom Rodden identified 4 major research challenges arising from e-Science: - Developing a Semantic Grid - Trusted Ubiquitous Systems - Rapid Customized Assembly of Services - Autonomic Computing

Towards a Semantic Grid • Trace provenance from initial data to information and knowledge structures • Techniques to allow scalable reasoning over uncertain/incomplete knowledge • Tools for design, development and deployment of large-scale ontologies • Support for semantic-directed knowledge discovery to complement data-mining • Development of flexible network-based reasoning and decision support services

Trusted Ubiquitous Systems • New theories to model, specify and analyse trust in distributed ubiquitous systems • New quality of service and service-based models for ubiquitous systems • New design guidelines and practices to enable the development of reusable trusted components • New understanding of the practical engineering trade-offs required to realise trusted ubiquitous systems

Rapid Customised Assembly of Services • New theories to describe and reason about semantics and behaviour of services and compositional effects • Agent and service representations that promote adaptability and emergent, opportunistic and implicit arrangement of services • New tools to support the discovery, composition and use of services based on high-level description of requirements • Techniques to support directed automatic composition, decomposition and recomposition of services

Autonomic Computing • Techniques to analyze, describe and reason about adaptive systems • Management of semi-autonomous systems with policies, services and software agents • Interoperability and reasoning across and between different autonomous domains • Modeling and measurement of performance of QoS for autonomic structures • Techniques to capture and represent history, context and environment

Self-Configuring Adapt automatically to the dynamically changing environments Self-Healing Discover, diagnose, and react to disruptions Self- Configuring Self- Healing Self- Protecting Self- Optimizing Self-Protecting Anticipate, detect, identify, and protect against attacks from anywhere Self-Optimizing Monitor and tune resources automatically IBM Autonomic Computing Vision

A Business Case for the Grid • Total Cost of Ownership – TCO • Value of Open Standards • Industrial Applications • Time to exploitation • e-Utilities

Everything you always wanted to know about the Grid and never dared to ask

Everything you always wanted to know about the Grid and never dared to ask

Presentation Transcript

Everything You Always Wanted To Know About LVDTs But Were Afraid To Ask

Everything you always wanted to know about EBN but were afraid to ask…

Everything You Always Wanted to Know About FEW

Everything you always wanted to know about threading....

Everything You Always Wanted to Know About Ex *

Everything you always wanted to know about spanners * * But were afraid to ask

Everything You Always Wanted to Know About Ex *

EVERYTHING YOU ALWAYS WANTED TO KNOW ABOUT DEF

Everything You Always Wanted to Know About Catreqs But Were Afraid to Ask...

All you ever wanted to know about ETICS, but never dared to ask

‘ Everything you always wanted to know about claims …….’

Everything You Always Wanted to Know about GDP (but were afraid to ask)

Everything you never wanted to know about Bill

Everything You Always Wanted To Know About Limits*

Everything You Always Wanted to Know About Math*

Everything you always wanted to know about APA (But were afraid to ask)

EVERYTHING YOU ALWAYS WANTED TO KNOW ABOUT DEF

EVERYTHING YOU ALWAYS WANTED TO KNOW ABOUT DEF

‘ Everything you always wanted to know about claims …….’

Everything You Always Wanted to Know About Russia