1 / 15

Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS W GISS-15 May 2003

Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS W GISS-15 May 2003 Toulouse, France. The Grid Problem. Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource

bijan
Download Presentation

Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS W GISS-15 May 2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003 Toulouse, France

  2. The Grid Problem • Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” • Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of… • central location, • central control, • omniscience, • existing trust relationships.

  3. The Data Grid Problem “Enable a geographically distributed community [of thousands] to perform sophisticated, computationally intensive analyses on Petabytes of data” • Sounds like a separate class of problem, but is actually a superset. • So all work done on “Grid Problems” applies to “DataGrid Problems”. We just need some additional tools.

  4. Globus Approach • Software toolkit addressing key technical areas • Offer a modular “bag of technologies” • Enable incremental development of grid-enabled tools and applications • Define and standardize grid protocols and APIs(Our software development supports this goal.) • Focus is on inter-domain issues, not clustering • Supports collaborative resource use spanning multiple organizations • Integrates cleanly with intra-domain services • Creates a “collective” service layer

  5. Major Data Grid Projects • Earth System Grid (DOE Office of Science) • DG technologies, climate applications • European Data Grid (EU) • DG technologies & deployment in EU • GriPhyN – Grid Physics Network (NSF ITR) • Investigation of “Virtual Data” concept • Particle Physics Data Grid (DOE Science) • DG applications for HENP experiments

  6. Basic Data Grid Services 1. GridFTP: Data Transfer and Access • Common protocol for data movement • Secure, efficient, reliable, flexible, extensible, etc. • Grid Forum (Internet) Draft • Family of tools supporting this protocol • Wu-ftpd, ncftp, Globus Toolkit SDKs, etc. 2. Replica Management Architecture Simple scheme for managing: • multiple copies of files • collections of files

  7. GridFTP: Basic Approach • FTP is defined by several IETF RFCs • Start with most commonly used subset • Standard FTP: get/put etc., 3rd-party transfer • Implement standard but often unused features • GSS binding, extended directory listing, simple restart • Extend in various ways, while preserving interoperability with existing servers

  8. Features of GridFTP • Grid Security Infrastructure and Kerberos support: Robust and flexible authentication, integrity, and confidentiality • Third-party control of data transfer: user or application at one site initiates, monitors and controls a data transfer between two other sites • Parallel data transfer: On wide-area links, use multiple TCP streams in parallel between the same source and destination • Striped data transfer: Use multiple TCP streams to transfer data that is striped or interleaved across multiple servers

  9. Features of GridFTP (cont.) • Partial file transfer: Standard FTP allows transfer of the remainder of a file starting at an offset. GridFTP supports transfers of arbitrary subsets or regions of a file • Automatic negotiation of TCP buffer/window sizes: optimal settings for TCP buffer/window sizes can dramatically improve performance • Support for reliable and restartable data transfer: FTP standard includes basic features for restart that are not widely implemented. GridFTP exploits these features and extends them.

  10. GridFTP for Efficient WAN Data Transfer • Secure authentication • Parallel transfer gets job done quickly • Partial file access gets only required data • Up to 2.8Gb/s using a striped server architecture Parallel TransferFully utilizes bandwidth of network interface on single nodes. Parallel Filesystem Parallel Filesystem Striped TransferFully utilizes bandwidth of Gb+ WAN using multiple nodes.

  11. Current Data delivery processftp based • Pull – Semi anonymous ftp • Product ready • Email sent to user with instructions and password • User ftp via “anonymous” and with provided password • Ftp demon positions user to appropriate directory • User pull data • Push – routine data flows to high volume users • Account provided on remote system • When data available is pushed to remote system

  12. Potential Future data deliveryGRIDftp based • For routine multiple usage customers • Establish “Certificate process” with customer • Self-signed certificate authority • Customer generates private/public key pair • Generate user certificate with public key • Add user certificate to list of trusted users • Customer must install GridFTP client • Globus toolkit data management client bundle • Gsincftp • Java Commodity Grid Kit for Windows

  13. Potential Future data deliveryGRIDftp based • For routine multiple usage customers • Pull – • Product ready • Email notifies user that data is ready • User using GRIDftp and user certificate for authentication provided access and pulls data • Push – • Account provided on remote system with host certificate and our user certificate • These GRID certificate establish Virtual Organization between the two parties • When data available is GRIDftp used to pushed data to remote system

  14. Potential Future data deliveryGRIDftp based • For single usage customers Process to • Establish “Certificate process” with customer • Customer must install GridFTP client Currently seems too complex (not worth the effort) Would like to have simplified method such as • Email a one time use “user certificate” • Integrated with browser built in GRIDftp client

More Related