1 / 36

Grid Technologies Research and Development

Grid Technologies Research and Development. Ian Foster Argonne National Laboratory The University of Chicago. Credits. Globus project Co-PI: Carl Kesselman, USC Globus resarchers and developers at ANL, USC/ISI, NCSA, and elsewhere

walden
Download Presentation

Grid Technologies Research and Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid TechnologiesResearchand Development Ian Foster Argonne National Laboratory The University of Chicago

  2. Credits • Globus project Co-PI: Carl Kesselman, USC • Globus resarchers and developers at ANL, USC/ISI, NCSA, and elsewhere • Steve Tuecke, Randy Butler, Steve Fitzgerald, Brian Toonen, Gregor von Laszewski, and many others • Research supported by DARPA, DOE, NSF, NASA; equipment from Cisco Systems

  3. Grid Services Architecture:An Emerging Grid Computing Framework … a rich variety of applications ... Applns Appln Toolkits Remote data toolkit Async. collab. toolkit Remote sensors toolkit Remote comp. toolkit Remote viz toolkit ... Protocols, authentication, policy, resource management, instrumentation, discovery, etc., etc. Grid Services Grid Fabric Archives, networks, computers, display devices, etc.; associated local services

  4. Overview • Why Grid Services? • Review of existing Grid services • Security • Information/directory • Resource management • Data access • Our current research focus areas • Grid Forum and b-Grid project

  5. Creating a Usable Grid :Grid Services (“Middleware”) • Standard grid services that • Provide uniform, high-level access to a wide range of resources (including networks) • Address interdomain issues of security, policy, etc. • Permit application-level management and monitoring of end-to-end performance • Middleware-level and higher-level APIs and tools targeted at application programmers • Map between application and Grid

  6. The Challenge of Heterogeneity • Group • Institutions, people; policies • Resources • Hardware: computers, archives, networks, ... • Interface • Software, mechanisms • Distance • Local, campus, metropolitan, wide area • Scale • Single CPU, cluster, supercomputer, ...

  7. Grid Services Approach • Define and deploy standard Grid services that encapsulate heterogeneity • Simple: Cost of joining Grid is low • Noncoercive: Sites retain local control • Uniform: Cost of using Grid is low • Use a Grid information service to represent structure and status of Grid elements • Resource discovery • Application configuration and optimization • Build Grid-enabled tools to enable applications

  8. Grid Services • Security: authentication, authorization • Information: publication, delivery • Resource management: reservation, allocation, monitoring, control • Data: data access, replica management, metadata access • + fault detection, executable management, accounting, others

  9. Grid Services (1):Grid Security Infrastructure • Define uniform authentication and authorization mechanisms that allow cooperating sites to accept credentials while retaining local control • Benefit: Only one A/A infrastructure needs to be maintained at each site; enables inter-site resource sharing & interoperability • Requires • Authentication/authorization standards • Certification authority policies

  10. Authentication • “Grid Security Infrastructure” • Single sign-on via global credential, PKI mechanisms, mapping to local credentials • Delegation • No plaintext passwords • Retains local control over policy • Deployed across PACI and NASA sites • GSS-API binding, used by ssh, SecureCRT, gsiftp, Globus, Condor, others • GAA (Generic Authorization & Access Control) interface provides hooks for policy

  11. Site 1 Site 2 global-to-local mapping table global-to-local mapping table Resource Proxy Resource Proxy Crp Crp Process Process Cp Cp Process Process Local Policy and Mechanisms Local Policy and Mechanisms Security Architecture Protocol 1: user proxy creation Host User Crp User Proxy Protocol 2: resource allocation Cp Protocol 3: Resource allocation from a process

  12. Grid Services (2):Grid Information Service • Effective resource use predicated on knowledge of system components • Publish structure and state info, dynamic performance info, software info, etc., etc. • Selection and scheduling of resources • Resource discovery: “find me an X with property Y available at time T” • Auto-configuration: “tell me what I need to know to use A efficiently/securely/...” • Gateways to other data sources required

  13. Information ServicesTechnical Approaches • Infrastructure based on common protocols • LDAP as unifying communication protocol • Gateways to alternative information sources and organization • Research questions include • Unifying metadata representation • How to support range of access modes • Scalability of collection and publication methods • Index methods and discovery

  14. ISI NCSA NCSA NCSA U.Tenn Distributed Information Services RootServers ReferralServer Replicated servers mds.globus.org:389 NCSA NASA DOE NPACI Remos NWS SNMP Organization Servers Index Server(s)

  15. Grid Services (3):Resource Management • Issues include: • Locating and selecting resources • Allocating resources • Authentication, process creation • Other activities required to prepare a resource for use; monitoring, control • End-to-end management/co-allocation • Diverse resources: CPU, disk, network • Reservation

  16. Resource Management Services • Globus Resource Allocation Manager (GRAM) • Uniform interface to resource management • Integration with security, policy • Co-allocation services • Coordinated allocation across multiple resources • Globus Arch. for Reservation and Allocation • Network and CPU quality of service • Immediate and advance reservations • Resource brokers: e.g., Condor

  17. “10 GFlops, EOS data, 20 Mb/sec -- for 20 mins” GRAM GRAM GRAM Resource Management Architecture Info service: location + selection Metacomputing Directory Service Resource Broker “What computers?” “What speed?” “When available?” “20 Mb/sec” GRAM Globus Resource Allocation Managers “50 processors + storage from 10:20 to 10:40 pm” Fork LSF EASYLL Condor etc.

  18. Local Resource Management MDS client API calls to locate resources GRAM Client MDS Update MDS with resource state information GRAM client API calls to request resource allocation and process creation. Site boundary GramReporter Query current status of resource Gatekeeper Local Resource Manager Create Authentication Allocate & create processes Request Job Manager Globus Security Infrastructure Process Parse Monitor & control Process RSL Library Process

  19. Advanced Resource Management • Provide end-to-end Quality of Service to applications. This requires: • Discovery and selection of resources • Allocation of resources • Advance reservation of resources Workstation Workstation Supercomputer Supercomputer Workstation Workstation

  20. GARA and Differentiated Services Server Client GARA API Diffserv Resource Manager Diffserv Resource Manager

  21. Scheduling Bulk Transferand High-Priority Transfers

  22. Integrated Policy Management • Required to control reservation and scheduling • Determine who can to what to whom • Integral part of resource management • Resource application, applicationresource • Next step after authentication • Need to integrate with and augment existing approaches • Access control lists, capabilities, usage certificates

  23. Policy: Technical Approaches • Single API to alternative mechanisms • Similar to security infrastructure • Integration with Globus security model and Globus resource management components • Basic policy mechanism in current system • Research questions • Reusable policy structures for resource specification/management • Policy aware resource discovery/scheduling

  24. Grid Services (4):Storage and I/O Services • Access to remote data (GASS) • Uniform access to diverse storage management systems • Cache management • High-speed, secure transport: gsiftp • Integration with metadata & storage systems • Communication (Nexus, GlobusIO) • Application-level interfaces to comm services • Multiple methods: reliable/unreliable, IP/other, unicast/multicast • Quality of service interfaces

  25. Current Technology Focus Areas • Advanced resource management techniques • GARA: Globus Arch. for Resv. & Allocation • High-end data-intensive applications • “Data Grid” • Interfaces to commodity technologies • CoG Kit: Commodity Grid Toolkits • Distance visualization • NOVA: Network Optimized Visualization Arch. With supporting work on info/instr., policy, accounting, authentication/authorization, etc.

  26. The Grid Forumhttp://www.gridforum.org • IETF-like community forum for discussion & definition of Grid infrastructure • First two meetings (June 16-18, Oct 18-20) attracted 150 people • 9 working groups established in security, information infrastructure, resource management, accounting, etc. • Next mtg: San Diego March 22-24 2000 • See also European Grid Forum • www.egrid.org

  27. b-Grid(“Broadband Experimental Terascale Access”) • A proposal to NSF to plan (& build) a national infrastructure for computer systems research • dedicated to research • of a scale that permits realistic experimentation • of a scale that encourages participation by adventurous applications groups • a place for computer and application scientists to tackle problems together • Initial plan is for O(20) Linux clusters, each with O(30) nodes, O(2 TB) disk, Gb/s network http://dsl.cs.uchicago.edu/beta

  28. Summary: Where We Are • Solid technology base for security, resource management, information services • Globus v1.1 completed, with all core services complete, robust, and documented • Many tool projects are leveraging this considerable investment in infrastructure • Substantial deployment activities and application experiments • New R&D in commodity grids, resource management, distance viz, data grids http://www.globus.org

  29. Case Study 1:Online Instrumentation Advanced Photon Source wide-area dissemination desktop & VR clients with shared controls real-time collection archival storage tomographic reconstruction DOE X-ray source grand challenge: ANL, USC/ISI, NIST, U.Chicago

  30. Case Study 2:Distributed Supercomputing • Starting point: SF-Express parallel simulation code • Globus mechanisms for • Resource allocation • Distributed startup • I/O and configuration • Fault detection • 100K vehicles (2002 goal) using 13 computers, 1386 nodes, 9 sites NCSA Origin Caltech Exemplar CEWES SP Maui SP SF-Express Distributed Interactive Simulation: Caltech, USC/ISI

  31. OVERFLOW with latency-tolerant algorithms MPICH-G “Grid-enabled” message passing Globus services Security Directory Scheduling Process mgmt Communication ARC SGI O2000 (California) Argonne SGI O2000 (Illinois) OVERFLOW simulation: NASA Ames

  32. Case Study 3:Collaborative Engineering • Manipulate shared virtual space, with • Simulation components • Multiple flows: Control, Text, Video, Audio, Database, Simulation, Tracking, Haptics, Rendering • Uses Globus comms: (un)reliable uni/multicast • Future: Security, QoS, allocation, reservation CAVERNsoft: UIC Electronic Visualization Laboratory

  33. Case Study 4:High-Throughput Computing • Schedule many independent tasks (e.g., parameter study) • Uses Globus security, discovery, data access, scheduling • Future: Reservation, accounting, code management, etc. Deadline Cost Available Machines Nimrod-G: Monash University

  34. Case Study 5:Problem Solving Environment • Problem solving environment for comp. chemistry • Globus services used for authentication, remote job submission, monitoring, and control • Future: distributed data archive, resource discovery, charging ECCE’: Pacific Northwest National Laboratory

More Related