the globus striped gridftp framework and server l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The Globus Striped GridFTP Framework and Server PowerPoint Presentation
Download Presentation
The Globus Striped GridFTP Framework and Server

Loading in 2 Seconds...

play fullscreen
1 / 32

The Globus Striped GridFTP Framework and Server - PowerPoint PPT Presentation


  • 239 Views
  • Uploaded on

The Globus Striped GridFTP Framework and Server. Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu 2 Ian Foster 1,2 1 Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Globus Striped GridFTP Framework and Server' - Leo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the globus striped gridftp framework and server

The Globus Striped GridFTP Framework and Server

Bill Allcock1 (presenting) John Bresnahan1Raj Kettimuthu1 Mike Link2Catalin Dumitrescu2 Ioan Raicu2 Ian Foster1,2

1 Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.

2 Dept of Computer Science, University of Chicago, Chicago, IL 60615, U.S.A.

introduction
Introduction
  • Problem we are addressing
  • A brief discussion of the GridFTP Protocol
  • Design / Architecture of our implementation
  • Performance results
technology drivers

114 genomes

735 in progress

You are here

Technology Drivers
  • Internet revolution: 100M+ hosts
    • Collaboration & sharing the norm
  • Universal Moore’s law: x103/10 yrs
    • Sensors as well as computers
  • Petascale data tsunami
    • Gating step is analysis
  • & our old infrastructure?
what issues are we addressing
What issues are we addressing?
  • Striping
    • storage systems are often clusters, and we need to be able to utilize all of that parallelism
  • Collective Operations
    • essentially, the striping should be invisible to the outside world
  • Uniform interface
    • Ideally, any data source can be treated the same way
what issues are we addressing5
What issues are we addressing?
  • Network Protocol Independence
    • TCP has well known issues with high Bandwidth-Delay Product networks
    • Need to be able to take advantage of aggressive protocols on circuits.
  • Diverse Failure Modes
    • Much happening under the covers, so must be resilient to failures
  • End-to-End Performance
    • We need to be able to manage performance for a wide range of resources
what is gridftp
What is GridFTP?
  • A secure, robust, fast, efficient, standards based, widely accepted data transfer protocol
  • A Protocol
    • Multiple Independent implementation can interoperate
      • This works. Fermi Lab has an implementation with their DCache system and U. Virginia has a .Net implementation that work with ours.
      • Lots of people have developed clients independent of the Globus Project.
  • The Globus Toolkit supplies a reference implementation:
    • Server
    • Client tools (globus-url-copy)
    • Development Libraries
gridftp the protocol
GridFTP: The Protocol
  • Existing standards
    • RFC 959: File Transfer Protocol
    • RFC 2228: FTP Security Extensions
    • RFC 2389: Feature Negotiation for the File Transfer Protocol
    • Draft: FTP Extensions
    • GridFTP: Protocol Extensions to FTP for the Grid
      • Grid Forum Recommendation
      • GFD.20
      • http://www.ggf.org/documents/GWD-R/GFD-R.020.pdf
what did the gridftp protocol add
What did the GridFTP protocol add?
  • Extended Block Mode
    • data is sent in “packets” with a header containing a 64 bit offset and length
    • allows out-of-order reception of packets
  • Restart and Performance Markers
    • allows for robust restart and perf monitoring
  • SPAS/SPOR
    • striped PASV and striped PORT
    • allows a list of IP/ports to be returned
what did the gridftp protocol add10
What did the GridFTP Protocol add?
  • Data Channel Authentication
    • Needed since in third party transfer, you don’t know who will connect to the listener.
  • ESTO/ERET
    • allows for additional processing on the data prior to storage/transmission
    • We use this for partial file transfers
  • SBUF/ABUF
    • manual and automatic TCP buffer tuning
  • Options to set parallelism/striping parameters
slide11

Client/Server vs 3rd Party

3

Establish data connection

Server

Server

Server

1

Establish control connection

2

Establish data connection

1

Establish control connection

2

Establish control connection

Client

Client

Client Server Model

3rd Party Transfer Model

overall architecture

Info on transfer: restart markers, performance markers, etc. Server PI optionally processes these, then sends them to the client PI

Client PI

Control

Channels

Server PI

Server PI

Internal IPC API

Internal IPC API

DTP

DTP

Data Channel

Description of transfer: completely server-internal communication. Protocol is unspecified and left up to the implementation.

Overall Architecture
possible configurations

Control

Data

Possible Configurations

Typical Installation

Separate Processes

Control

Control

Data

Data

Striped Server

Striped Server (future)

Control

Data

data transfer processor

Data

sourceor sink

Data Storage Interface

Data Processing Module

Data Channel Protocol Module

Data

channel

Data Transfer Processor
data storage interface
Data Storage Interface
  • This is a very powerful abstraction
  • Several can be available and loaded dynamically via the ERET/ESTO commands
  • Anything that can implement the interface can be accessed via the GridFTP protocol
  • We have implemented
    • POSIX file (used for performance testing)
    • HPSS (tape system; IBM)
    • Storage Resource Broker (SRB; SDSC)
    • NeST (disk space reservation; UWis/Condor)
extensible io xio system
Extensible IO (XIO) system
  • Provides a framework that implements a Read/Write/Open/Close Abstraction
  • Drivers are written that implement the functionality (file, TCP, UDP, GSI, etc.)
  • Different functionality is achieved by building protocol stacks
  • GridFTP drivers will allow 3rd party applications to easily access files stored under a GridFTP server
  • Other drivers could be written to allow access to other data stores.
  • Changing drivers requires minimal change to the application code.
  • Ported GridFTP to use UDT in less than a day
    • AFTER the UDT driver was written
globus xio approach

Disk

Special Device

Globus XIO Approach

Network Protocol

Network Protocol

Driver

Network Protocol

Globus XIO

Driver

Application

Driver

globus xio framework
Globus XIO Framework
  • Moves the data from user to driver stack.
  • Manages the interactions between drivers.
  • Assist in the creation of drivers.
    • Asynchronous support.
    • Close and EOF Barriers.
    • Error checking
    • Internal API for passing operations down the stack.

User API

Driver Stack

Transform

Transform

Framework

Transport

what issues are we addressing21
What issues are we addressing?
  • Striping
    • storage systems are often clusters, and we need to be able to utilize all of that parallelism
  • Collective Operations
    • essentially, the striping should be invisible to the outside world
  • Uniform interface
    • Ideally, any data source can be treated the same way
what issues are we addressing22
What issues are we addressing?
  • Network Protocol Independence
    • TCP has well known issues with high Bandwidth-Delay Product networks
    • Need to be able to take advantage of aggressive protocols on circuits.
  • Diverse Failure Modes
    • Much happening under the covers, so must be resilient to failures
  • End-to-End Performance
    • We need to be able to manage performance for a wide range of resources
summary
Summary
  • The GridFTP protocol provides a good set of features for data movement requirements in the Grid.
  • The Globus Striped Server (Zebra) implementation of this protocol provides a flexible design / architecture for integrating with different communities, storage systems, and protocols.
  • Our implementation is robust and performant over a range of environments.