1 / 37

Globus – Part II

Globus – Part II. Sathish Vadhiyar. Globus Information Service. MDS. Meta directory service, Monitoring and discovery service For publishing and accessing system and application data Can restrict access to MDS information by using GSI

jonah
Download Presentation

Globus – Part II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Globus – Part II Sathish Vadhiyar

  2. Globus Information Service

  3. MDS • Meta directory service, Monitoring and discovery service • For publishing and accessing system and application data • Can restrict access to MDS information by using GSI • Interacts with local information services – hour-glass mechanism • Provides caching to minimize transfer of upto-date information and lessen network overhead

  4. MDS • Integrates existing systems while providing uniform and extensible data model • Uniform API • Adopts data representation and API, query language and protocol from LDAP directory service • Uses 2 protocols • GRIP – for providing information about entities • GRRP – for registering entities • LDAP query language supports: • Search • Enquiry • subscription

  5. MDS Architecture GIIS – Grid Index Information Service GRIS – Grid Resource Information Service

  6. MDS • Support for multiple information service providers - information providers specified on a per attribute basis • MDS Data: • System information: architecture, OS • Network information • Load status • Additional information sent to GIIS by GRAM reporter • Job status • Queue information • Information viewed through web browser or web client commands

  7. MDS • Contains entries where each entry is associated with one or more attribute:value pairs • Each entry associated with a distinguished name. • Object class are associated with entries – for object types

  8. Distinguished name example

  9. Another Example

  10. Distinguished names for Networks

  11. Globus Data Grid

  12. Data Grid • Challenges: • Petabytes and terabytes of data • Query management to this huge data • Cache management • Providing gigabit/sec QoS • Coscheduling data transfers and computation • Selection of dataset replicas • Maximize use of scarce storage, computation and network resources

  13. Data Grid Motivation • Application requirements: • A reliable secure high-performance data transfer protocol • Management of multiple copies of files and collections of files

  14. Data Grid Architecture

  15. GridFTP • Secure file transfer over Grid • Multiple data channels for parallel transfers – using multiple TCP streams in parallel to improve aggregate bandwidth • Partial file transfers • Third-party (direct server-to-server) transfers by adding GSSAPI security to the existing third-party data transfers in FTP standard – transfers between 2 servers mediated by a third-party client • GSSAPI operations authenticate the third party to the source and destination machines of data transfer

  16. Grid FTP contd… • Authenticated data channels - both GSI and Kerberos security • Reusable data channels • Striped data transfers • 2 libraries: • globus_ftp_control_library – implements control channel API • gobus_ftp_client_librray – implement GridFTP API • Plugin mechanisms for fault tolerance, performance monitoring, and extended data processing

  17. Globus Replica Management Architecture • Replica management • For better performance or availability to accesses • Mainly for access to “published” resources – read-only model • Functions: • Architecture: • Lower level replica catalog API • Higher level replica management API

  18. Replica catalog • Provides mapping between logical names of files/locations and physical objects on storage systems • Stores 3 kinds of entries • Logical collection – user defined collections of files – file aggregation • Location entries – physical locations of files • Logical files – globally unique names • Replica catalog API provides operations on the replica catalog • Replica management API provides session management, catalog creation, file maintenance, access control • Implemented with LDAP

  19. Replica management • Globus Replica Management integrates the Globus Replica Catalog (for keeping track of replicated files) and GridFTP (for moving data) and provides replica management capabilities for data grids. • The globus_replica_management library provides client functions that allow files to be registered with the replica management service, published to replica locations, and moved among multiple locations. • Managing the copying and placement of files in a distributed computing system so as to improve the performance of data analysis

  20. Replica management service - functions • Registration of files with the replica management service • Creation and deletion of replicas of previously registered files • Enquiries concerning the location and performance characteristics of replicas. • Replica selection based on performance characteristics

  21. Replica management • Replica management API – combines storage system operations with calls to low-level catalog API functions • Replica management system controls where and when copies are created and provides information about copies • But does not ensure file consistency

  22. RM API • Session management • Session handles and attributes • Restart • Rollback • Catalog creation and file management • Creating catalog entries • registering files • Publishing files • Copying, deleting files • Future ideas • Incorporating advance researvation • Automatic replica selection and creation • Data grid projects • http://www.globus.org/datagrid/projects.html

  23. Replica Catalog Illustration

  24. Replica Selection in Globus Data Grid (Vazhkudai et al.) • Replica selection uses MDS for information regarding characteristics of storage systems • LDAP information organized as DIT (Directory Information Tree) • Each storage resource in Data Grid incorporates GRIS • LDAP can execute shell scripts in the background to obtain various dynamic entities like availableSpace, mountPoint etc. • Static attributes like seek times can be entered by the system administrator • Attributes like data transfer rates across networks to clients can be obtained based on past performance, i.e., historical data • ClassAds can also be used for expressing storage attributes

  25. Directory for Storage GRIS

  26. Metadata Specification

  27. Performance Data Specification

  28. Steps in Replica Management • Application queries metadata expressing desired characteristics of logical files • A logical file is returned • Application queries replica catalog for replica instances for the logical file • Storage broker helps to choose a particular replica

  29. Replica Selection

  30. Storage Architecture steps • Application presents classAds regarding replica requirements to SB • SB does search: • Queries replica catalogs with the list of all replicas • Queries individual GRIS of replicas about their characteristics • Collects all information and proceeds to matching • Match: • Converts replica capabilities to replica classAds • Matches application classAds to replica classAds • Accesses file using GridFTP

  31. Globus References / sources / credits • Grid Information Services for Distributed Resource Sharing. K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman. Proceedings of the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press, August 2001. • Usage of LDAP in Globus. I. Foster, G. von Laszewski.This short note describes the use of LDAP in the Globus toolkit. It answers three questions: What is LDAP? Where is it used? and Why is it used in Globus? • A Directory Service for Configuring High-Performance Distributed Computations. S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith, S. Tuecke. Proc. 6th IEEE Symposium on High-Performance Distributed Computing, pp. 365-375, 1997.Describes the Metacomputing Directory Service used to maintain information about Globus components.

  32. Globus References / sources / credits • The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets.  A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke. Journal of Network and Computer Applications, 23:187-200, 2001 (based on conference publication from Proceedings of NetStore Conference 1999). • Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing. B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, S. Tuecke. IEEE Mass Storage Conference, 2001.Presents the design and performance characteristics of two fundamental technologies for data management. • Replica Selection in the Globus Data Grid. S. Vazhkudai, S. Tuecke, I. Foster. Proceedings of the First IEEE/ACM International Conference on Cluster Computing and the Grid (CCGRID 2001), pp. 106-113, IEEE Computer Society Press, May 2001.Discusses a high-level replica selection service that uses information regarding replica location and user preferences to guide selection from among storage replica alternatives.

  33. JUNK !!

  34. RFT (Reliable File Transfer) • Treat movement of multiple files as a single job • Accept transfer requests and reliably manage requests • OGSI compliant • To transfer data reliably between two GridFTP servers • Uses Grid Service Handles (GSH) • Acts as a proxy for the user, acts as client on user’s behalf for third-party transfers

  35. RFT • Client submits SOAP description of data transfer job • Maintains checkpoints in data bases • Supports both “push” and “pull” mechanisms

  36. Data Grid Replica Services • Need for meta-data services • Various kinds: • Application metadata • Replica metadata • System configuration metadata • Replica management • For better performance or availability to accesses • Mainly for access to “published” resources – read-only model

  37. Replica Catalog • Provide mappings between logical names for file or collections and one or more copies of those objects on physical systems • Services provided by replica catalog: • Registering a list of files as a logical collection • Registering the physical location of a complete or partial replica of a logical collection • Registering information about a particular logical file in a logical collection • Modifying the contents of registered entities of the catalog • Responding to queries of the catalog • The Globus Replica Catalog supports replica management by providing mappings between logical names for files and one or more copies of the files on physical storage systems

More Related