1 / 72

Everything you always wanted to know about the Grid and never dared to ask

Everything you always wanted to know about the Grid and never dared to ask. Tony Hey and Geoffrey Fox. Outline. Lecture 1: Origins of the Grid – The Past to the Present (TH) Lecture 2: Web Services, Globus, OGSA and the Architecture of the Grid (GF)

howard
Download Presentation

Everything you always wanted to know about the Grid and never dared to ask

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Everything you always wanted to know about the Grid and never dared to ask Tony Hey and Geoffrey Fox

  2. Outline • Lecture 1: Origins of the Grid – The Past to the Present (TH) • Lecture 2: Web Services, Globus, OGSA and the Architecture of the Grid (GF) • Lecture 3: Data Grids, Computing Grids and P2P Grids (GF) • Lecture 4: Grid Functionalities – Metadata, Workflow and Portals (GF) • Lecture 5: The Future of the Grid - e-Science to e-Business (TH)

  3. Lecture 1 Origins of the Grid – The Past to the Present [Grid Computing Book: Chs 1,2,3,4,5,6,36]

  4. Lecture 5 The Future of the Grid – e-Science to e-Business [Grid Computing Book: Chs 38, 39, 40,41,42,43]

  5. Lecture 5 • e-Science Research and the Future of Scientific Research • Computer Science Research Issues • A Business Case for the Grid • Concluding Remarks

  6. e-Scienceand the Future of Scientific Research ‘e-Science will change the dynamic ofthe way science is undertaken.’ John Taylor, 2001

  7. “Problem Solving Environments”Domain-specific application interfaces for scientists Computing Grid service Data discovery Grid service Data visualisation Grid service Experiment control Grid service Grid services middleware AuthenticationAuthorisationAccounting Computers Data storage Experiments local remote local remote local remote Integrated e-Science Environment Framework for distributed scientific computing and experimentation

  8. e-Science Examples • Particle Physics • Virtual Observatories • e-Engineering • e-Chemistry • Bioinformatics • High-Throughput Applications • e-Health

  9. DAME Project In flight data Global Network eg: SITA Ground Station Airline DS&S Engine Health Center Maintenance Centre Internet, e-mail, pager Data centre

  10. Comb-e-Chem Structure + Properties Knowledge + Prediction Structures DB Properties DB Simulation and calculation

  11. Combinatorial Chemistry • Parallel synthetic approach • create hundreds of materials • screen properties to find those that fit the bill • Typically requires several passes • find chemical structure of the best candidates • create new batches of similar materials for subsequent passes • Leads to explosive growth in: • volume of data generated • potential to exploit this data

  12. MeOH EtOH PrOH BuOH R1COOH R2COOH Monitor & Analysis Data R3COOH R4COOH Interface to Grid same reaction sequencefor all combinations Array production of different chemical species

  13. Well plate with typically 96 or 384 cells Library synthesis Mass Spec Raman databases Structure and properties analysis x-ray High throughput systems • 100,000’s compounds at a time analysis • Produces huge amounts of complex data

  14. Users Users Users Experiment Experiment Expert Remote equipment, multiple users, few experts Data & control links Access Grid links Remote (Dark) Laboratory • Model for National crystallographic Service NCS

  15. NCS Workflow Send sample material to NCS service Collaborate in e-Lab experiment and obtain structure Search materials database and predict properties using Grid computations Download full data on materials of interest

  16. NCS Portal Access

  17. NCS Experimental Services

  18. NCS Lab Service

  19. myGrid Project • Imminent ‘deluge’ of data • Highly heterogeneous • Highly complex and inter-related • Convergence of data and literature archives

  20. myGrid: Generic Technologies • Database access from the Grid • Process enactment on the Grid • Personalisation services • Metadata services • Development of Agent Services • Ultimate goal is to put Grid Services together withOntologies to develop ‘Semantic Grid’

  21. Workflow • Know how. • Associate base resources with derived data. • Keep, describe, find, compare, protect, share. • Repeat/reuse/re-enact • Specialise/Customise/Personalise • Evolution – notification, knowledge • Quality & best practice • Need the workflows to be effective • good experimental practice. 1 2 3 4

  22. Personalisation • Dynamic creation of personal data sets • Personal views over repositories • Personalisation of workflows • Personal notification • Annotation of datasets and workflows • Personalisation of service descriptions – ‘what I think the service does’ 1 2 3 4

  23. Provenance • Who, what, where, why, when, how? • The traceability of knowledge as it is evolves and as it is derived. • Identity – the Life Sciences ID • Lab Books, Methods in papers. • Immutable Metadata • Migration – travels with its data but may not be stored with it. • Private vs Shared provenance records. • Ownership/credit 1 2 3 4

  24. Discovery Net Project In Real Time Scientific Information Scientific Discovery Real Time Integration Workflow Construction Operational Data Literature Instrument Data Databases Interactive Visual Analysis Dynamic Application Integration Using Distributed Resources Images

  25. Discovery Process Management • Workflow = Service Composition + Discovery Pathway • Towards a Standard Workflow Representation for Discovery Informatics: Discovery Process Markup Language (DPML): • Discovery Pathway Construction: Recording and managing a collaboratively-built discovery process • Distributed Service Composition: Components organsied by the workflow can be executing anywhere • Discovery Pathway as Key Intellectual Property:Discovery Processes can be stored, reused, audited, refined and deployed in various forms D-Net Workflow for Genome Annotation : 16 services executing across Internet

  26. Dynamic Integration Services • Dynamic Application Integration = On-demand access and composition of remote analysis components • Towards a Dynamic Component Integration: • Knowledge Servers: allow users to register, locate and remotely execute components • Execution Servers: allow users to control the execution of components distributed environments • Easy Maintenance:New components can be added through a clean API Clustering Classification Text analysis Gene function perdition D-NET API Promoter Prediction Homology Search

  27. Identify Organism Chromosomes Organism’s DNA Identify Genes tRNAs, rRNAs Gene markers Non-translated RNAs EMBL NCBI genscan blast Regulatory Regions Repetitive Elements TIGR SNP grail Repeat Masker Segmental Duplication SNP Variations Literature References ….. E-PCR genscan Identify Proteins Classify into Protein Families Inter Pro Inter Pro blast 3D-PSSM Functional Characteisation Homologues SMART SWISS PROT Domain 3-D Structure PFAM Motif Search Fold Prediction Secondary structure predator DSC Literature References ….. Relate Pathway Maps Ontologies Cell Cycle Metabolism GO CSNDB Drugs Biological Process….. AmiGO GeneMaps KEGG GK Cell death Embryogenesis virtual chip GenNav Literature References ….. Case Study: SC2002 HPC Challenge D-Net based Global Collaborative Real- Time Genome Annotation High Throughput Sequencers Nucleotide-level Annotation Genome Annotation Protein-level Annotation Process-level Annotation 15 DBs 21 Applications

  28. KEGG Inter Pro SMART Execute distributed annotation workflow SWISS PROT EMBL NCBI TIGR SNP GO Interactive Editor & Visualisation How It Works Nucleotide Annotation Workflows Download sequence from Reference Server Save to Distributed Annotation Server • 500 Web access • 1800 clicks • 200 copy/paste • 3 weeks work in 1 workflow and few second execution

  29. eDiamond Applications of SMF Teleradiology and QC VirtualMammo Training and Differential Diagnosis “Find one like it” ? Advanced CAD SMF-CAD workstation Epidemiology SMFcomputed breast density

  30. Image guided interventions Images Courtesy Derek Hill Guy’s Hospital

  31. Image guided interventions (2) Images Courtesy Guy’s Hospital

  32. Surgical verificationAccuracy of surgical placement against plan • Surgeon plans on X-ray or CT, uses database of prostheses • Operation takes place using plan as guidance • Post operative X-ray evaluated for accuracy of placement • Data stored and used for short term assessment and long term evaluation studies Courtesy of Ian Revie Depuy International

  33. Summary • UK e-Science projects emphasize data federation and integration as much as computation • Metadata and ontologies key to higher level Grid services • e-Science projects will produce a deluge of scientific data that will need to be annotated and curated in scientific data ‘digital libraries’

  34. Databases in the Grid Semantic Web Data Complexity Classical Web Classical Grid Computational Complexity

  35. OGSA – DAI Project • Key middleware project for UK Program - Total Budget £3M (CP £1.5M) • Three Centres involved: - Edinburgh, Manchester and Newcastle • Industrial partners: - IBM US, IBM Hursley and Oracle UK • Goal is to develop high-quality data-centric middleware

  36. OGSA – DAI Project • Design Specification completed • Papers for GGF WG on Database Access and Integration Services • Alpha versions delivered: • Distributed Query Service • XML Database Interface • Relational Database Interface • Beta versions by April 2003 • Integrate with Globus GT3 release

  37. e-Scienceand the Future of Scientific Research ‘e-Science will change the dynamic ofthe way science is undertaken.’ John Taylor, 2001 • Need to break down the barriers between the Victorian ‘bastions’ of science – biology, chemistry, physics, …. • Develop ‘permeable’ structures that promote rather than hinder multidisciplinary collaboration • Engage Computing Services and Libraries in developing a new e-Science support service on Campus

  38. e-Science and Computer Science • The lesson of the Web • The Semantic Grid • The myGrid project • The Discovery Net Project • Computer Science Research and the Grid

  39. Error 404: Page not found ‘If you want the Web to scale, You must allow the links to fail’ Wendy Hall after Tim Berners-Lee • HTML as the ‘Fortran’ of Hypertext!

  40. Semantic Web

  41. Metadata & Ontologies • Metadata – computationally accessible data about the services • Ontologies – the shared and common understanding of a domain • A vocabulary of terms • Definition of what those terms mean. • A shared understanding for people and machines • Usually organised into a taxonomy.

  42. Reasoning in DAML+OIL • Consistency— check if knowledge is meaningful • Subsumption— structure knowledge, compute classification • Equivalence— check if two classes denote same set of instances • Instantiation— check if individual instance of class C • Retrieval— retrieve set of individuals that instantiate C

  43. Computer Science Challengesfrom e-Science UK CS Team led by Tom Rodden identified 4 major research challenges arising from e-Science: - Developing a Semantic Grid - Trusted Ubiquitous Systems - Rapid Customized Assembly of Services - Autonomic Computing

  44. Towards a Semantic Grid • Trace provenance from initial data to information and knowledge structures • Techniques to allow scalable reasoning over uncertain/incomplete knowledge • Tools for design, development and deployment of large-scale ontologies • Support for semantic-directed knowledge discovery to complement data-mining • Development of flexible network-based reasoning and decision support services

  45. Trusted Ubiquitous Systems • New theories to model, specify and analyse trust in distributed ubiquitous systems • New quality of service and service-based models for ubiquitous systems • New design guidelines and practices to enable the development of reusable trusted components • New understanding of the practical engineering trade-offs required to realise trusted ubiquitous systems

  46. Rapid Customised Assembly of Services • New theories to describe and reason about semantics and behaviour of services and compositional effects • Agent and service representations that promote adaptability and emergent, opportunistic and implicit arrangement of services • New tools to support the discovery, composition and use of services based on high-level description of requirements • Techniques to support directed automatic composition, decomposition and recomposition of services

  47. Autonomic Computing • Techniques to analyze, describe and reason about adaptive systems • Management of semi-autonomous systems with policies, services and software agents • Interoperability and reasoning across and between different autonomous domains • Modeling and measurement of performance of QoS for autonomic structures • Techniques to capture and represent history, context and environment

  48. Self-Configuring Adapt automatically to the dynamically changing environments Self-Healing Discover, diagnose, and react to disruptions Self- Configuring Self- Healing Self- Protecting Self- Optimizing Self-Protecting Anticipate, detect, identify, and protect against attacks from anywhere Self-Optimizing Monitor and tune resources automatically IBM Autonomic Computing Vision

  49. A Business Case for the Grid • Total Cost of Ownership – TCO • Value of Open Standards • Industrial Applications • Time to exploitation • e-Utilities

More Related