1 / 16

Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008

Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008. John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and The University of Chicago. GridFTP. High-performance, reliable data transfer protocol optimized for high-bandwidth wide-area networks

kasi
Download Presentation

Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and The University of Chicago

  2. GridFTP • High-performance, reliable data transfer protocol optimized for high-bandwidth wide-area networks • Based on FTP protocol - defines extensions for high-performance operation and security • Standardized through Open Grid Forum (OGF) • Globus implementation of GridFTP is widely used for bulk data movement • Average of more then 3 million transfers per day

  3. GridFTP GridFTP Server CC Client DC CC GridFTP Server

  4. Key features • Performance • Security • GSI, SSH • Username/password and anonymous • Cluster-to-cluster data movement/striping • Support for reliable and restartable transfers • Modular • Easy to plug-in alternate transport protocols • Storage systems too - HPSS, SRB

  5. Globus Reliable File Transfer Service (RFT) • GridFTP client that provides more reliability • GridFTP - on demand transfer service • Not a queuing service • RFT • Queues requests • Orchestrates transfers on client’s behalf • Writes to persistent store • Recovers from GridFTP and RFT service failures

  6. RFT Client SOAP Messages Notifications(Optional) RFT Service Persistent Store CC CC DC GridFTP Server GridFTP Server

  7. New features in GridFTP • GridFTP information provider service • Max connections • Open connections • Load • Higher level services can utilize this information for scheduling data transfers • Help with selecting the appropriate replica of data

  8. Concurrency GridFTP Server CC Client DC CC GridFTP Server

  9. Concurrency • Client submits concurrent transfer requests to the server • Significantly improves the performance of lots of small files transfers • APS used this feature to transfer 1 TB of data to Australia at 30x faster than SCP • LIGO used this feature to transfer 1.5 TB of data from Milwaukee to Germany at 80 MB/s

  10. Multicasting

  11. GridFTP Overlay Network BWsd

  12. Bottleneck detection • Determine the bottleneck for the data transfer performance • Network read, network write, disk read, disk write • Netlogger is used to determine these values • Netlogger is shipped with Globus, starting from 4.2 • ./configure --enable-netlogger • make gridftp globus_xio_netlogger_driver

  13. Popen • Popen XIO driver • allows users to open pipes to the standard IO of existing programs • leverage programs like you can with UNIX pipes • globus-gridftp-server -p 5000 -fs-whitelist popen,file,ordering -aa • globus-url-copy -dst-fsstack popen:argv=#/usr/bin/zip#/home/bresnaha/text.txt.zip#-,ordering ftp://localhost:5000/home/bresnaha/text.txt ftp://localhost:5000/y

  14. New features in RFT • Command line client in C • A new feature rich and fast command line client. • Globus-crft • GT4.2 RFT has more robust retry mechanisms. • help prevent overheating in certain cluster configurations.

  15. Connection caching • Instead of only caching connections across a users single transfer request they are now cached against all transfer requests. • This has dramatic performance increases when a user performs multiple requests • Eliminate authentication overhead on the control and data channels

  16. Connection caching • Measured performance improvement for jobs submitted using Condor-G • For 500 jobs - each job requiring file stageIn, stageOut and cleanup (RFT tasks) • 30% improvement in overall performance • No timeout due to overwhelming connection requests to GridFTP servers

More Related