1 / 40

Introduction to eInfrastructure

Introduction to eInfrastructure. Jennifer M. Schopf UK National eScience Centre Argonne National Lab. Talk Outline. Definition of Grids, eInfrastructure, and eResearch JISC plans Globus Toolkit Provider of basic infrastructure Focus on data tools OMII – Open Middleware Infrastructure

alaina
Download Presentation

Introduction to eInfrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction toeInfrastructure Jennifer M. Schopf UK National eScience Centre Argonne National Lab

  2. Talk Outline • Definition of Grids, eInfrastructure, and eResearch • JISC plans • Globus Toolkit • Provider of basic infrastructure • Focus on data tools • OMII – Open Middleware Infrastructure • UK repository and distribution of eResearch tools

  3. What is a Grid? • Many definitions – many differences especially between academics and industry • Both use the buzzword to get funding • My definition • Resource sharing • Coordinated problem solving • Dynamic, multi-institutional virtual orgs

  4. Resource Sharing • Resources can be anything- • Computers • Storage/repositories • Sensors and Networks • People and software • Local Control of the resources, and local policies for their use • Sharing is always conditional • Issues of trust, policy • Negotiation and payment

  5. Coordinated Problem Solving • Beyond client-server • Client Server defines a small set of well-understood interactions as the only ones that can take place • Actions in this space can include • Distributed data analysis • Computation and visualization of results • Collaboration

  6. Virtual Organization (VO) Concept • VO for each application or workload • Carve out and configure resources for a particular use and set of users

  7. Dynamic, Multi-institutionalVirtual Organizations • Crossing administrative domains • No one has full control over the resources • Local policy not global • Different local policy on different sites • Community overlays on classic organizational structures • Large or small, static or dynamic

  8. What is eScience or eResearch? • Use of distributed resources, in a coordinated way, across multiple administrative domains to do science or further your research • “Classic” eScience • Use compute and data resources at many sites to run large scale simulations for a physics or biology application • Today’s Use Cases • Replicate data across multiple sites to increase reliability, redundancy and performance • Use one common interface to access a variety of data resources at multiple sites • Look at a number of available resources to select the one that best suits the application needs at this time

  9. What is eInfrastructure? • “A framework (political, technological and administrative) for the easy and cost-effective shared use of distributed electronic resources across a geographical area” • “The combination of research infrastructure, grid, and broadband technologies projects” • “Anything that enables eScience, collaborative research – distributed, persistent, reliable, accessible services” • “Broader than Grids - includes things like digital libraries, networking, etc” • “current Grid-based eInfrastructure model”

  10. How does JISC define it? • “Similar to NSF’s cyberinfrastructure work” (CI==Grids) • Tony Hey (JCSR chair) says • “A national eInfrastructure to support collaborative and multidisciplinary research and innovation is the joint responsibility of RCUK (OST) and JISC (HEFCs)” • 2006 eInfrastructure–Grid initiatives continue building advanced Grid-empowered infrastructures • Production quality & ready-to-use SW • Environments dynamically adaptable to user needs

  11. Malcolm Read has said • E-infrastructure includes: • Networks (internet, light paths…) • Computers (workstations, servers, HPC…) • Access controls (security, AAA…) • Middleware (metadata…) • Finding tools (portals, search engines…) • Digital libraries (bibliographic, text, images, sound…) • Research data (national and scientific databases, individual data…)

  12. JISC funding for eInfrastructure July 27 ‘05 press release for additional funds • http://www.jisc.ac.uk/ index.cfm?name=news_spendingreview • Continued development of JANET • Further digitisation of major scholarly collections • Enhancement to e-learning programmes, (e-assm’t, e-portfolios, e-learning tools) • Development of the e-infrastructure • Incl development of collaborative env’ts • Development of a shared infrastructure to support use of institutional repositories

  13. Much Still To Be Defined • I’ve been told ~ £11M specifically for eInfrastructure • Starting in April 2006, 2 years of funding • Programme manager being hired • OST roadmap is basis (due by March, no draft available yet) • areas are (no mapping to funding amount) • 1:    Middleware/AA/DRM                                        • 2:    Networks and Computer Power (Hardware)   • 3:    Preservation and Curation                              • 4:    Search and Navigation                                     • 5:    Data and Information Creation                          • 6:    Virtual Research Communities  

  14. JISC cont. • When this is better formulated, it will be broadcast widely • There’s a JCSR meeting in mid February where some of it should be solidified

  15. Questions on Definitions or JISC?

  16. Two Common eInfrastructure Approaches in the UK • Globus Toolkit • Open Middleware Infrastructure Institute (OMII) release

  17. What functionality isneeded to use a Grid? • Basics: • Run a job • Transfer a file • Find out what’s going on (service and job monitoring • All done securely • Higher-level • Replication • Higher level data movement • Workflow-scheduling

  18. Globus ToolkitWas Created To Help Applications • The Globus Toolkit consists of collections of solutions to problems that frequently come up when trying to build collaborative distributed applications • Heterogeneity • Focus on simplifying heterogeneity for application developers • Working towards more “vertical solutions” • Standards • Capitalize on and encourage use of existing standards (IETF, W3C, OASIS, GGF) • Reference implementations of new/proposed standards in these organizations • Open source, open contribution model

  19. Globus is an Hour Glass Higher-Level Services and Users • Local sites have an their own policies, installs – heterogeneity! • Queuing systems, monitors, network protocols, etc • Globus unifies • Build on Web services • Use WS-RF, WS-Notification to represent/access state • Common management abstractions & interfaces Standard GT4 Interfaces Local heterogeneity

  20. Globus Toolkit: Open Source Grid Infrastructure Globus Toolkit v4 www.globus.org Data Replication CredentialMgmt Replica Location Grid Telecontrol Protocol Delegation Data Access & Integration Community Scheduling Framework WebMDS Python Runtime Reliable File Transfer CommunityAuthorization Workspace Management Trigger C Runtime Authentication Authorization GridFTP Grid Resource Allocation & Management Index Java Runtime Security Data Mgmt Execution Mgmt Info Services CommonRuntime

  21. GT4 Web Services Core • Supports both GT (GRAM, RFT, Delegation, etc.) & user-developed services • Redesign to enhance scalability, modularity, performance, usability • Leverages existing WS standards • WS-I Basic Profile: WSDL, SOAP, etc. • WS-Security, WS-Addressing • Adds support for emerging WS standards • WS-Resource Framework, WS-Notification • Java, Python, & C hosting environments • Java is standard Apache

  22. WSRF & WS-Notification • Naming and bindings (basis for virtualization) • Every resource can be uniquely referenced and has one or more associated services for interacting • Lifecycle (basis for resilient state management) • Resources created by svcs following a factory pattern • Resource destroyed immediately or scheduled • Information model (basis for monitoring & discovery) • Resource properties associated with resources • Operations for querying and setting this info • Asynchronous notification of changes to properties • Service groups (basis for registries & collective svcs) • Group membership rules and membership management • Base fault type

  23. WSRF vs XML/SOAP • The definition of WSRF means that the Grid and Web services communities can move forward on a common base • Why Not Just Use XML/SOAP? • WSRF and WS-N are just XML and SOAP • WSRF and WS-N are just Web services • Benefits of following the specs: • These patterns represent best practices that have been learned in many Grid applications • There is a community behind them • Why reinvent the wheel? • Standards facilitate interoperability

  24. Basic Globus Security Mechanisms • Grid-wide identities implemented as PKI certificates • Transport-level and message-level authentication • Ability to delegate credentials to agents • Ability to map between Grid & local identities • Local security administration & enforcement • Single sign-on support implemented as “proxies” • A “plug in” framework for authorization decisions

  25. The Challenge of GridResource Management • Enabling secure, controlled remote access to heterogeneous computational resources and management of remote computation • Authentication and authorization • Resource discovery & characterization • Reservation and allocation • Computation monitoring and control • Addressed by a set of protocols & services • GRAM protocol as a basic building block • Resource brokering & co-allocation services • GSI for security, MDS for discovery

  26. GT4 ExecutionManagement (GRAM) • Common WS interface to schedulers • Unix, Condor, LSF, PBS, SGE, … • More generally: interface for process execution management • Lay down execution environment • Stage data • Monitor & manage lifecycle • Kill it, clean up • A basis for application-driven provisioning

  27. Find your data: Replica Location Service Managing ~40M files in production settings Move/access your data: GridFTP, Reliable File Transfer (RFT) High-performance striped data movement Couple data & execution management GRAM uses GridFTP and RFT for staging Access databases through standard Grid interfaces: OGSA-DAI GT4 Data Functions

  28. GridFTP in GT4 • Basic file transfer support, and memory-to-memory copies • High-performance, secure, reliable data transfer • Optimized for high-bandwidth wide-area networks • FTP with well-defined extensions • Uses basic Grid security (control and data channels) • Multiple data channels for parallel transfers • Partial file transfers • Third-party (direct server-to-server) transfers • Performance tuning • Greatly improve performance over most FTP implementations • On TeraGrid network achieved 27 Gbs on a 30 Gbs link (90% utilization) with 32 nodes

  29. IPCReceiver DataChannel DataChannel MasterDSI SlaveDSI Protocol Interpreter SlaveDSI Protocol Interpreter Data Channel MasterDSI IPCReceiver Data Channel IPC Link IPC Link Reliable File Transfer:Third Party Transfer • Fire-and-forget transfer • Web services interface • Many files & directories • Integrated failure recovery RFT Client SOAP Messages Notifications(Optional) RFT Service GridFTP Server GridFTP Server

  30. Monitoring and Discovery System(MDS4) • Grid-level monitoring system used most often for resource selection • Aid user/agent to identify host(s) on which to run an application • Uses standard interfaces to provide publishing of data, discovery, and data access, including subscription/notification • WS-ResourceProperties, WS-BaseNotification, WS-ServiceGroup • Functions as an hourglass to provide a common interface to lower-level monitoring tools

  31. MDS4 Components • Information providers • Basic data sources – queue data, cluster data, etc • Can be from web services, executables, files • Index Service • Caching registry of data • Trigger Service • Warnings when conditions are met • WebMDS • Visualization of data

  32. Debian Fedora Core FreeBSD HP/UX IBM AIX Red Hat Sun Solaris SGI Altix (IA64 running Red Hat) SuSE Linux Tru64 Unix Apple MacOS X (no binaries) Windows – Java components only Tested Platforms List of binaries and known platform-specific install bugs at http://www.globus.org/toolkit/docs/4.0/admin/ docbook/ ch03.html

  33. Condor-G, DAGman MPICH-G2 GRMS Nimrod-G Ninf-G Open Grid Computing Env. Commodity Grid Toolkit GriPhyN Virtual Data System Virtual Data Toolkit GridXpert Synergy Platform Globus Toolkit VOMS PERMIS GT4IDE Sun Grid Engine PBS scheduler LSF scheduler GridBus TeraGrid CTSS NEES IBM Grid Toolbox … Many Tools Build on, or Can Contribute to, GT4-Based Grids

  34. Any questions about Globus?

  35. Open MiddlewareInfrastructure Institute To be a leading provider of reliable interoperable and open-source Grid middleware components services and tools to support advanced Grid enabled solutions in academia and industry. • Formed University of Southampton (2004) • Focus on an easy to install e-Infrastructure solution • Utilise existing software & standards • Expanding with new partners in 2006 • OGSA-DAI team at Edinburgh • myGrid team at Manchester Slides compliments of Steven Newhouse

  36. OMII Functions • Provide a software repository of Grid components and tools from e-science projects • Re-engineering software, harden it, and provide support for components sourced from the community • Contract the development of “missing” software components necessary in grid middleware (managed programme) • Provide an integrated grid middleware release of the sourced software components Slides compliments of Steven Newhouse

  37. The Managed Programme: Distribution and Repository • OGSA-DAI (Data Access service) • GridSAM (Job Submission & Monitoring service) • Grimoires (Registry service based on UDDI) • GeodiseLab (Matlab & Jython environments) • FINS (Notification services using WS-Eventing) • BPEL (Workflow service) • MANGO (Managing workflows with BPEL) • FIRMS (Reliable messaging) Slides compliments of Steven Newhouse

  38. So… • eInfrastructure has many definitions – but basically it’s Grid computing • JISC has funding for this – but haven’t yet defined where it will be spent • Globus Toolkit provides many basic tools, and is incorporated in many projects, esp those focused on data movement • In the UK, OMII is another useful source of eInfrastructure software

  39. Additional Information • Contact: • Jennifer M. Schopf • jms@mcs.anl.gov • http://www.mcs.anl.gov/~jms • Globus Alliance: • http://www.globus.org • Information about OMII: • http//www.omii.ac.uk • s.newhouse@omii.ac.uk

More Related