The Grid Needs You. Enlist Now!. Professor Carole Goble University of Manchester, UK, firstname.lastname@example.org Co-director e-Science North West UK regional centre Director myGrid UK e-Science pilot project Co-chair Global Grid Forum Semantic Grid Research Group.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
The Grid Needs You. Enlist Now! Professor Carole Goble University of Manchester, UK, email@example.com Co-director e-Science North West UK regional centre Director myGrid UK e-Science pilot project Co-chair Global Grid Forum Semantic Grid Research Group
The Grid Needs You. Enlist Now! The what and why of the Grid. Services, data and semantics and the Grid. Getting involved – a call to arms.
The take home • “The Grid is the next big thing” – and it isn’t just big computers and fat pipes. • The Grid is actually the latest attempt at distributed computing • If you aren’t involved yet maybe its because you don’t think its relevant, or its done already or you haven’t anything to offer • You are most likely wrong • If you are already into the Grid this is a “ra ra” exercise
Origins of the Grid • The Grid: Blueprint for a New Computing Infrastructure • Edited by Ian Foster and Carl Kesselman • July 1998, 701 pages. • a proposed distributed computing infrastructure for advanced science and engineering • pervasive and dependable
What is the Grid? • Computational power as a utility • Securely and transparently sharing supercomputing resources on demand. • Fast pig iron with fat pipes for cycle intensive scientific problems • Large scale data access and transportation • Making the most of what you have got
114 genomes 735 in progress Why do it now? • Enormous quantities of data: Petabytes • For an increasing number of communities, gating step is not collection but analysis • Ubiquitous Internet: 100+ million hosts • Collaboration & resource sharing the norm • Ultra-high-speed networks: 10+ Gb/s • Global optical networks • Huge quantities of computing: 100+ Top/s • Moore’s law gives us all supercomputers
What is the Grid for? • Global e-Science • Large-scale science and engineering are done through theinteractionof people, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed. • The motivation for “Grids” is to facilitate the routine interactions of these resources in order to support large-scale science and engineering. KEYWORDS • Collaboration, Democratization, Speculation Bill Johnston, NASA July 01
Global Collaborative Knowledge Communities Slide courtesy of Ian Foster
Global Knowledge Communities • Teams organised around common goals • Communities: “Virtual organisations” • Overlapping memberships, resources and activities • Essential diversity is a strength & challenge • membership & capabilities • Geographic and political distribution • No location/organisation/country possesses all required skills and resources • Dynamic: adapt as a function of their situation • Adjust membership, reallocate responsibilities, renegotiate resources Slide derived from Ian Foster’s SSDBM 03 keynote
The Grid Opportunity “flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources - what we refer to as virtual organizations." KEYWORD: VIRTUALISATION The Anatomy of the Grid: Enabling Scalable Virtual Organizations Foster, Kesselman, Tueke
Why Grids? • A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour; • A biologist combines a range of diverse and distributed resources (databases, tools, instruments) to answer complex questions; • 1,000 physicists worldwide pool resources for petaop analyses of petabytes of data • Civil engineers collaborate to design, execute, & analyze shake table experiments • Climate scientists visualize, annotate, & analyze terabyte simulation datasets • An emergency response team couples real time data, weather model, population data • A multidisciplinary analysis in aerospace couples code and data in four companies Slide courtesy of Steve Tuecke
(Chicago) STAR TAP (UC San Diego) SDSC Osaka University Tokyo XP TransPACAPAN vBNS JGN UHVEM (Osaka, Japan) NCMIR (San Diego) Telemicroscopy • Sharing of UHVEM(Ultra High Voltage Electron Microscopy) in Osaka University with NCMIR (National Center for Microscopy and Imaging Research) • 3 Million electron volts; the most powerful microscopy facility • KEYWORDS: SHARING SCARCE RESOURCES ON DEMAND
Smallpox Grid • United Devices, IBM, Oxford University, Accelrys • Analysis of 35 million drug compounds against nine smallpox proteins to try to find a way to stop the replication of the virus. • Volunteers from over 190 countries donated their spare CPU power at www.grid.org, the world's largest public computing resource • Contributed over 39,000 years of computing time in less than six months. September 30, 2003 —delivered the results of the Smallpox Research Grid project to representatives from the United States Department of Defense in an event hosted by the British Embassy in Washington, D.C.
Digital Digital 2,000,000 - Screened every Year 120,000 - Recalled for Assessment 10,000 - Cancers 1,250 - Lives Saved 230 - Radiologists (Double Reading) 50% - Workload Increase
RealityGrid http://www.realitygrid.org Closely coupling computation and experiment to speed up scientific discovery. Simulation, visualization and data gathering coupled Scientist remotely steers calculation from laptop Visualization and computation use supercomputers accessed via Grid. X-ray microtomography produces 3D X-ray attenuation maps of specimens at a microscopic level
Collaboration • Interactive environments and virtual presence integrated with Grid middleware • SARS Combat Grid, Taiwan • Emergency Access Grids • Integration of patient data and models of dissemination http://www.accessgrid.org
computers software Grid instruments Shared data archives colleagues Foundation for e-Science sensor nets Diagram derived from Ian Foster’s slide
Butterfly.net • Fully-distributed server technology pioneering the use of open grid computing protocols in large-scale immersive game networks that support unlimited numbers of players and require the most demanding levels of service.
More commercial examples… • Novartis Pharmaceuticals accelerate lead identification and profiling to increase relevant targets in drug discovery, screening applications that were previously considered CPU constrained. • Nippon Life Insurance improve the performance of Financial Risk Management Applications customer project in applying Grid technology for this application. Reduced processing time for financial risk calculation from around 10 hours to about 49 minutes – a 12-fold increase in speed. Can run more complex scenarios to reduce risk exposure
Global Grid Forumhttp://www.ggf.org • Standards body for Grid Computing • Over 2000 members • All the vendors • 44 WGs and RGs • Three meetings per annum • ~ 1000 attendees at plenary meetings • ~ 400 at “working” meetings • GGF10 Frankfurt, March 2004
Investment • UK Government invested £240 million into e-Science and Grid related research • EU invested ~€351million in FP5 and FP6 • USA invested – lots! • IBM invested ~10-20% R&D budget in Grid Computing • $1.5million per annum on GridFTP alone • Japan and China invested in Grids • Practically every EU member has a Grid programme.
The Grid means what I say it means • The Grid – the vision of forming federations • A Grid - A virtual organisation of resources • Machines – computational grid • Geography – a UK Grid • A field – Mouse Genome Grid • A (temporary) problem – protein folding simulation • No one grid – lots of interoperating Grids • Grid middleware infrastructure specification • Services stacks, policies, protocols, standards, APIs • Reference implementations • Globus, Condor, Unicore, Sun Grid Engine, Avaki, United Devices... • Grid tools • Portals, heartbeat monitors etc • E-Science: application of all the above for the benefit of Science
The Grid is forming federations… • Infrastructure middleware for establishing, managing, and evolving multi-organizational federations • Dynamic, autonomous, domain independent • On-demand, ubiquitous access to computing, data, and services • Mechanisms for resource virtualization & workflow management within federations • New capabilities constructed dynamically and transparently from distributed services • Service-oriented, virtualization
…when the federations are… • Dynamic and volatile. A consortium of services (databases, sensors, compute servers) participating in a complex analysis may be switched in and out they become available or cease to be available; • Ad-hoc. Service consortia have no central location, no central control, and no existing trust relationships; • Large Hundreds of services could be orchestrated at any time; • Potentially long-lived. A simulation could take weeks. HOLD THESE THOUGHTS!
myGrid http://www.mygrid.org.uk Knowledge-driven middleware for data intensive ad hoc in silico experiments in biology • Straightforward discovery, interoperation, deployment & sharing of services • Service-oriented architecture • Semantic based discovery of workflows and workflow composition • Integration and Information • Workflow & Distributed DB Queries • Experimentation • Provenance, propagating change, personalisation
Three legacy views • Grid middleware is a bag of low level protocols • The Grid is about compute cycle stealing • The Grid is about plumbing and has nothing to do with semantics
Three legacy views • Grid middleware is a bag of low level protocols • The Grid is about compute cycle stealing • The Grid is about plumbing and has nothing to do with semantics • This was once true. Some still hold this view (notably US programme managers) • It is not the view of the Grid visionaries or the Grid policy makers outside the US.
The Open Grid Service Architecture Data Grids Semantic Grids Three legacy views • Grid middleware is a bag of low level protocols • The Grid is about compute cycle stealing • The Grid is about plumbing and has nothing to do with semantics • This was once true. Some still hold this view (notably US programme managers) • It is not the view of the Grid visionaries or the Grid policy makers outside the US.
X.509, LDAP, FTP, … Globus Toolkit Defacto standards GGF: GridFTP, GSI Grid Evolution1st generation • Computationally intensive • File access/transfer • Bag of various heterogeneous protocols & toolkits • Monolithic design • Recognises internet, ignores Web • Academic teams Increased functionality, standardization Legion, Condor, Unicore … Custom solutions Time (based on Foster GGF7 Plenary)
Open Grid Services Arch Web services GGF: OGSI, … (+ OASIS, W3C) Multiple implementations, including Globus Toolkit 3 X.509, LDAP, FTP, … Globus Toolkit Defacto standards GGF: GridFTP, GSI • Data intensive -> knowledge intensive • Open services-based architecture • Recognises Web services • Global Grid Forum • Industry participation Grid Evolution2nd Generation App-specific Services Increased functionality, standardization Custom solutions Time (based on Foster GGF7 Plenary)
Grid Applications OGSA OGSI Web Services Open Grid Services Architectureongoing since early 2002 Specific services: drug discovery pipeline Standard services: agreement, data access and integration, workflow, security, policy… Standard interfaces and behaviours for distributed systems: naming, service state, lifetime management, notification Standard mechanisms for describing and invoking services: WSDL, SOAP, WS-Security etc (Graphic courtesy of Savas Parastatidis )
OGSI: Standard Web Services Interfaces & Behaviours • Naming and bindings (basis for virtualization) • Every service instance has a unique name (Grid Service Handle) from which can discover supported bindings which are volatile (Grid Service Reference) • Two tiered naming scheme to cope with service migration and failover • Lifecycle (basis for fault resilient state management) • Service instances created by factories • Destroyed explicitly or via soft state • Information model (basis for monitoring & discovery) • Service data (attributes) associated with GS instances (SDEs) • Operations for querying (introspecting) and setting this info • Asynchronous notification of changes to service data • Service Groups (basis for registries & collective services) • Group membership rules & membership management • Base Fault type All sound kind of familiar?
Client • Introspection: • What port types? • What policy? • What state? GridService (required) Other standard interfaces: factory, notification, collections Grid Service Handle Service data element Service data element Service data element handle resolution Grid Service Reference OGSI • Lifetime management • Explicit destruction • Soft-state lifetime Data access Implementation Hosting environment/runtime (“C”, J2EE, .NET, …) (Slide courtesy of Ian Foster)
Resource allocation Create Service Grid Service Handle Service data Keep-alives Notifications Service invocation Service discovery Register Service Service instances OGSI Authentication & authorization are applied to all requests Service factory Service requestor (e.g. user application) Service registry Interactions standardized using WSDL (Slide courtesy of Ian Foster)
4. findByHandle(GSH) 5. new GSR with new network endpoint 6. successful access to moved service through new GSR 3. failed access with old network endpoint info (old GSR) 2. new network endpoint (GSR) registration for same GSH Service 1. Service Migration Service Migration HandleResolver Requester GSH GSR GSH GSR <wsdl> hdl:1.2/abc ... ... Service Locator hdl:1.2/abc <wsdl> ... ... Service Hosting Environment B Hosting Environment A (Slide courtesy of Ian Foster)
Layering a component-based distributed object model over a web service framework Early OGSI implementations Globus Toolkit 3 OGSI.NET OGSI::Lite Unicore Sound familiar? Web Services Loose coupled, stateless, persistent Grid Services Robust naming, stateful, lifetime management CORBA Tightly coupled, naming, stateful, lifetime management
Service specific operations OGSA Operations WS-Context and/or other WS-* Service Implementation OGSI Status and Issues https://forge.gridforum.org/projects/ogsi-wg • OGSI version 1.0 in GGF proposed recommendation • Issue: compliance to Web Service Standards • GWSDL changes WSDL 1.1 by extending portType syntax to define a Service Data Element. • Why not use WS standards for state management idioms: e.g. WS-Context/Coordination? • By eliminating a new mandatory infrastructure (OGSI), can use conventional tooling. • But it needs to meet the requirements of Grid (Graphic courtesy of Savas Parastatidis)
300 pound gorillas • If you want to use standards then you have to use them or work with them • W3C and OASIS are big gorillas • E.g. GSH/GSR, Handle.net, Life Science Identifer and WS-address
Grid Applications On The MoveThe rise of the Information Grid Large scale data Large number of machines Computationally intensive Simple semantics Small homogeneous communities Smaller scale data Data intensive Complex heterogeneous applications Complex semantics Large diverse communities High Energy Physics Functional Genomics Oceanography Biodiversity Earth Science Neuroscience …
Data-intensive integration:what the e-scientist REALLY wants • Scientists do data integration • Actually they do application and model integration too! • Cooperative information systems • Workflows • Data virtualisation
… & Types of Information ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 126.96.36.199) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S -> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI
Data on the Grid pre: OGSA Chiefly files! LDAP as a query language No RDBMS access from Globus 1.1 MDS and MCAT catalogs Honorable exception Storage Resource Broker “Support data-intensive applications that manipulate very large data sets by building upon object-relational database technology and archival storage technology”
OGSA-Data Access and IntegrationGGF OGSA-DAIS WG • Data Grid applications benefit from many lower level services: • Data movement. • Data Replication. • Data Virtualisation • Database access and integration. • Work underway on designing, developing and standardising many core Grid Data Management services. • Designing services in a dynamic and heterogeneous environment is non-trivial, • Plenty to be done!! Clever semantic integration stuff here OGSA-DAI Distributed Query OGSA-DAI Basic Services Data Grid Infrastructure – Location, Delivery, Replication… Resource Grid Infrastructure – OGSA… Database, Communication, OS… Technology
Data Intensive X-ology Researchers Data Intensive Applications for X-ology Research Simulation, Analysis & Integration Technology for X-ology Generic Virtual Data Access and Integration Layer OGSA Job Submission Brokering Workflow Structured Data Integration Registry Banking Authorisation OGSI: Interface to Grid Infrastructure Data Transport Resource Usage Transformation Structured Data Access Compute, Data & Storage Resources Distributed Structured Data Virtual Integration Architecture Relational XML Semi-structured - Infrastructure Architecture (Slide Courtesy Malcolm Atkinson, UK National e-Science Centre
OGSA-DAIS, OGSA-DAIS, OGSA-DAIT • Part of Globus Toolkit 3 • Data can be XML, RDBMS and ODBMS • UK dominance DB2 Oracle 10g
1a. Request to Registry for sources of data about “x” SOAP/HTTP service creation API interactions Registry 1b. Registry responds with Factory handle 2a. Request to Factory for access to database Factory Client 2c. Factory returns handle of GDS to client 2b. Factory creates GridDataService to manage access 3a. Client queries GDS with XPath, SQL, etc XML / Relational database Grid Data Service 3c. Results of query returned to client as XML 3b. GDS interacts with database Data Access & Integration Services Slide Courtesy Malcolm Atkinson, UK eScience Center
mass = 200 decay = bb mass = 200 decay = ZZ mass = 200 mass = 200 decay = WW stability = 1 LowPt = 20 HighPt = 10000 mass = 200 decay = WW stability = 3 mass = 200 decay = WW mass = 200 decay = WW stability = 1 mass = 200 event = 8 mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW event = 8 mass = 200 plot = 1 mass = 200 decay = WW stability = 1 plot = 1 mass = 200 decay = WW plot = 1 Virtual Data Concept • Capture and manage information about relationships among • Data (of widely varying representations) • Programs (& their execution needs) • Computations (& execution environments) • Apply this information to, e.g. • Discovery: Data and program discovery • Workflow: for organizing, locating, specifying, & requesting data • Explanation: provenance • Planning and scheduling Search for WW decays of the Higgs Boson for which only stable, final state particles are recorded? Workflow by Rick Cavanaugh and Dimitri Bourilkov, University of Florida