teragrid data transfer l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
TeraGrid Data Transfer PowerPoint Presentation
Download Presentation
TeraGrid Data Transfer

Loading in 2 Seconds...

play fullscreen
1 / 40

TeraGrid Data Transfer - PowerPoint PPT Presentation


  • 137 Views
  • Uploaded on

TeraGrid Data Transfer. Jeffrey P. Gardner Pittsburgh Supercomputing Center gardnerj@psc.edu. Outline . GSISSH Use passwordless login between TeraGrid machines Hand-on Exercises TeraGrid File Management Data Transfer Performance GridFTP Terminology TeraGrid Deployment

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'TeraGrid Data Transfer' - etienne


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
teragrid data transfer

TeraGrid Data Transfer

Jeffrey P. Gardner

Pittsburgh Supercomputing Center

gardnerj@psc.edu

outline
Outline
  • GSISSH
    • Use passwordless login between TeraGrid machines
    • Hand-on Exercises
  • TeraGrid File Management
    • Data Transfer Performance
    • GridFTP
      • Terminology
      • TeraGrid Deployment
    • Hands-on Exercises
      • Use of GridFTP clients & servers to transfer files

CIG MCW, Boulder, CO

hands on preparation
Hands-on: Preparation

Prepare for exercises by logging into NCSA, getting valid proxy certificate.

Login to tg-login.ncsa.teragrid.org:

ssh userid@tg-login.ncsa.teragrid.org

Enter your password:

xxxxxx

Get a valid proxy certificate:

tg-login1> grid-proxy-init

Enter GRID pass phrase for this identity: yyyyyy

Creating proxy . . . . . . . . . . . Done

Your proxy is valid until: Tue Jun 21 08:06:03 2005

CIG MCW, Boulder, CO

gsissh ssh using tg certificates
GSISSH: SSH using TG Certificates
  • Now login to TACC using GSISSH
    • tg-login>gsissh tg-login.sdsc.teragrid.org
  • TA DA!
  • See that your NCSA certificate DN and user account name have been entered into TACC’s grid-mapfile
    • > grep -i userid /etc/grid-security/grid-mapfile"/C=US/O=National Center for Supercomputing Applications/CN=Jeff Gardner" gardnerj
  • Logout of TACC
    • > exit

CIG MCW, Boulder, CO

teragrid file placement
TeraGrid File Placement
  • No common cross-site filesystems (currently)
    • This will change very shortly!
    • NCSA, SDSC, TACC, ANL will install GPFS (“Global Parallel File System”)
  • User controls where their data resides
    • Appropriate sites(s)
    • Appropriate storage
      • Online Filesystem(s)
        • Speed, visibility, quotas, backup policy
        • Each filesystem directly accessible from single site
      • Mass Storage Systems
        • Long-term storage, slower access

CIG MCW, Boulder, CO

teragrid file movement
TeraGrid File Movement
  • File movement responsibility of user
    • Between Online Filesystems
      • Intra-site
      • Cross-site*
    • Between Mass Storage and Online Filesystems
      • Intra-site*
      • Cross-site*

* Session focuses on these types of transfers

CIG MCW, Boulder, CO

teragrid transfer environment
TeraGrid Transfer Environment
  • TeraGrid backbone bandwidth means Wide Area Network is rarely a bottleneck
    • SDSC<->Caltech<->NCSA<->PSC: 40 Gb/sec
    • NCSA<->TACC: 10 Gb/sec
  • GSI authentication and proxy certificates provide automagic security for transfers
    • just do “grid-proxy-init” and you’re in
  • Transfer requests can be integrated into job execution scripts
    • Moving input data to site(s) of job execution
    • Moving results to another filesystem, site, or archive

CIG MCW, Boulder, CO

data transfer performance
Data Transfer Performance
  • What impacts transfer rates?
    • Disk and filesystem speed
    • Connectivity of filesystem to node
    • Node characteristics & load
    • Connectivity of node to WAN
    • For all networks
      • Bandwidth
      • Latency
      • Buffer Size
      • Protocol
      • Load
      • Encryption …
  • Don’t expect 40 Gb/sec!

node

node

1 Gb/s

switch

30 Gb/s

WAN (TG Backbone) 40 Gb/s

30 Gb/s

switch

node

CIG MCW, Boulder, CO

performance choices matter
Performance – Choices Matter
  • Transfer large files for best performance
  • Use fast filesystems, dedicated transfer nodes, optimized transfer parameters
  • Transfer 1 GByte file from NCSA to SDSC (10/6/2004)

CIG MCW, Boulder, CO

gridftp terminology protocol
GridFTP Terminology - Protocol
  • “GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth, wide-area networks. GridFTP is based on FTP, the highly popular Internet file transfer protocol.”

- Quoted from Globus Alliance website

CIG MCW, Boulder, CO

terminology client
Terminology - Client
  • GridFTP client programs issue requests that adhere to the GridFTP protocol
    • Users run GridFTP client programs to transfer files
    • There is no client program named gridFTP, which can be confusing because users are told “use gridFTP to transfer your files”
    • tgcp, globus-url-copy and uberftpare three GridFTP client programs that are part of the Common TeraGrid Software Stack (CTSS)

CIG MCW, Boulder, CO

terminology 3 rd party transfer

User runs GridFTP client to request data transfer; HOST A

Requests in GridFTP protocol

Destination of Data

GridFTP Server Process

Host C

Source of Data

GridFTP Server Process

Host B

Data

Terminology – 3rd Party Transfer
  • A GridFTP transfer between two GridFTP servers, rather than between a server and a client, is called a third-party transfer
    • A third-party transfer occurs when the GridFTP client initiating the transfer is run on a system that isneither the source northe destination of thetransfer operation
    • Allows use of dedicated transfernodes

CIG MCW, Boulder, CO

terminology server
Terminology - Server
  • A GridFTP server process understands requests that adhere to the GridFTP protocol, and performs authentication and data transfer operations based on those requests
  • TeraGrid GridFTP servers usually run on:
    • Login nodes:
      • tg-login.<site>.teragrid.org
    • Dedicated GridFTP nodes:
      • tg-gridftp.<site>.teragrid.org
    • Some mass storage front-ends are GridFTP servers
      • mss.ncsa.teragrid.org

CIG MCW, Boulder, CO

tg gridftp server deployment
TG GridFTP Server Deployment
  • tg-login.<site>.teragrid.orgis a login node and also runs a GridFTP server
    • Shared resource; Many tasks
  • tg-gridftp.<site>.teragrid.orgis a dedicated GridFTP server
    • Dedicated file transfer resource
    • usually better connectivity

CIG MCW, Boulder, CO

tg gridftp client deployment
TG GridFTP Client Deployment

uberftp

  • interactive GridFTP transfer client
  • configurable tcp buffersize and number of parallel streams

CIG MCW, Boulder, CO

tg gridftp client deployment16
TG GridFTP Client Deployment

globus-url-copy <source_url> <destination_url>

  • command line interface
  • -tcp-bs <size> | -tcp-buffer-size <size>
    • specify the size (in bytes) of the buffer to be used by the underlying ftp data channels
  • -p <parallelism> | -parallel <parallelism>
    • specify the number of streams to be used in the ftp transfer

tgcp [gridFTP-server1:]file1 [gridFTP-server2:]file2

  • command line interface
  • friendly “scp-like” wrapper around globus-url-copy

CIG MCW, Boulder, CO

hands on
Hands-on:
  • Participants will be led through a series of exercises using tgcp, globus-url-copy and uberftp.
  • Demonstrates transferring files
    • Between TeraGrid sites
    • Between TG machines and archival storage systems

CIG MCW, Boulder, CO

hands on preparation18
Hands-on preparation:
  • Login to tg-login.ncsa.teragrid.org if you have not already done so
  • Get the test data file:

wget http://www.psc.edu/~gardnerj/test.file

CIG MCW, Boulder, CO

hands on exercise 1 gridftp between login nodes
Hands-on: Exercise 1GridFTP between login nodes

Copy a 9 MByte file from the current directory at NCSA to your home directory at TACC. Use the login node at TACC as the remote GridFTP server. Use default transfer parameters.

Use globus-url-copy to transfer the file:

Type command on a single line – no carriage return!

tg-login1> /usr/bin/time –f %e globus-url-copy

file:`pwd`/test.file

gsiftp://tg-login.tacc.teragrid.org/~/test.file.Ex1

3.18

CIG MCW, Boulder, CO

hands on exercise 2 gridftp between gridftp servers
Hands-on: Exercise 2GridFTP between GridFTP Servers

Copy a 9 MByte file from the current directory at NCSA to your home directory at TACC. Use a third-party transfer and the GridFTP server nodes at both NCSA and SDSC.

Use globus-url-copy to transfer the file:

tg-login1> /usr/bin/time -f %E globus-url-copy gsiftp://tg-gridftp.ncsa.teragrid.org/`pwd`/test.file gsiftp://tg-gridftp.tacc.teragrid.org/~/test.file-Ex2

3.01

CIG MCW, Boulder, CO

hands on exercise 3 gridftp between gridftp servers
Hands-on: Exercise 3GridFTP between GridFTP Servers

Copy a 9 MByte file from the current directory at NCSA to your home directory at TACC. Use a third-party transfer and the GridFTP server nodes at both NCSA and SDSC. Use optimized transfer parameters.

Use globus-url-copy to transfer the file:

tg-login1> /usr/bin/time -f %E globus-url-copy –tcp-bs 4000000 –p 4 gsiftp://tg-gridftp.ncsa.teragrid.org/`pwd`/test.file gsiftp://tg-gridftp.tacc.teragrid.org/~/test.file-Ex3

2.54

CIG MCW, Boulder, CO

hands on exercise 4 using tgcp
Hands-on: Exercise 4Using tgcp

Copy a 9 MByte file from your home directory at NCSA to your home directory at TACC using tgcp. tgcp automatically uses third-party transfers and optimized transfer parameters.

Add tgcp to your path (it is not in there by default):

tg-login1> soft add +tgcp

Use tgcp to transfer the file:

tg-login1> /usr/bin/time -f %E tgcp test.file

tg-gridftp.tacc.teragrid.org:/home/userid/test.file-Ex4

globus-url-copy –p 4 –tcp-bs 2000000

gsiftp://tg-gridftp.ncsa.teragrid.org:2812/home/ac/gardnerj/test.file

gsiftp://tg-gridftp.tacc.teragrid.org:2812/home/gardnerj/test.file

4.06 (?!!)

CIG MCW, Boulder, CO

hands on exercise 5 pg 1 uberftp between login nodes
Hands-on: Exercise 5 – pg 1UberFTP between login nodes

Copy a 9 MByte file from your NCSA home directory to TACC. Use optimized transfer parameters. Interactive session.

Start uberftp and set transfer parameters:

tg-login1> uberftp

uberftp> parallel 4

uberftp> tcpbuf 4000000

TCP buffer set to 4000000 bytes

Open connection to TACC:

uberftp> open tg-login.tacc.teragrid.org

%%% BANNER %%%

220 UNIX Archive FTP server ready.

230 User xxx logged in.

CIG MCW, Boulder, CO

hands on exercise 5 pg 2 uberftp between login nodes
Hands-on: Exercise 5 – pg 2UberFTP between login nodes

Copy the file:

uberftp> put test.file test.file-Ex5

150 Opening BINARY connection(s) for test.file-Ex5.

226 Transfer complete.

Transfer rate 9621728 bytes in 0.51 seconds. 19017.90 KB/sec

Get a listing of the TACC home directory:

uberftp> ls

-rw---- user group 9621728 date test.file-Ex1

-rw---- user group 9621728 date test.file-Ex2

-rw---- user group 9621728 date test.file-Ex3

. . .

Exit UberFTP:

uberftp> quit

CIG MCW, Boulder, CO

hands on exercise 6 pg 1 uberftp between gridftp servers
Hands-on: Exercise 6 – pg 1UberFTP between GridFTP servers

Copy a 9 MByte file from your NCSA home directory to TACC using third-party transfers. Use optimized transfer parameters. Interactive session.

Start uberftp and set transfer parameters:

tg-login1> uberftp

uberftp> parallel 4

uberftp> tcpbuf 4000000

TCP buffer set to 4000000 bytes

CIG MCW, Boulder, CO

hands on exercise 6 pg 2 uberftp between gridftp servers
Hands-on: Exercise 6 – pg 2UberFTP between GridFTP servers

Open “local” connection to NCSA dedicated GridFTP server

tg-login1> lopen tg-gridftp.ncsa.teragrid.org

220 tg-gridftp4.ncsa...blah..blah ready.

230 User xxx logged in.

Open “remote” connection to TACC dedicated GridFTP server:

uberftp> open tg-gridftp.tacc.teragrid.org

220 lonestar GridFTP...blah..blah ready.

230 User xxx logged in.

CIG MCW, Boulder, CO

hands on exercise 6 pg 3 uberftp between gridftp servers
Hands-on: Exercise 6 – pg 3UberFTP between GridFTP servers

Copy the file:

uberftp> put test.file test.file-ex6

src> 150 Opening BINARY mode data connection(s).

dst> 150 Opening BINARY mode data connection(s).

src> 226 Transfer complete.

dst> 226 Transfer complete.

Exit UberFTP:

uberftp> quit

CIG MCW, Boulder, CO

useful uberftp commands
Useful UberFTP commands
  • Unix-like commands
    • ls, cd, mkdir, rmdir, pwd, rm
  • Put “l” in front for “local” versions of commands
    • lls, lcd, lmkdir, lrmdir, lpwd, lrm
  • put
    • transfer from local host to remote host
  • get
    • transfer from remote host to local host
  • mput, mget
    • transfer multiple files between hosts
  • help

CIG MCW, Boulder, CO

tweaking optimization parameters
Tweaking Optimization Parameters

globus-url-copy

  • -tcp-bs <size> | -tcp-buffer-size <size>
    • specify the size (in bytes) of the buffer to be used by the underlying ftp data channels
    • “Low” network traffic: 8000000
    • “High” network traffic: 4000000
  • -p <parallelism> | -parallel <parallelism>
    • specify the number of streams to be used in the ftp transfer
    • Low network traffic: 1
    • High network traffic: 2 - 4

CIG MCW, Boulder, CO

tweaking optimization parameters30
Tweaking Optimization Parameters

uberftp

  • tcpbuf <size>
    • specify the size (in bytes) of the buffer to be used by the underlying ftp data channels
    • “Low” network traffic: 8000000
    • “High” network traffic: 4000000
  • parallel <parallelism>
    • specify the number of streams to be used in the ftp transfer
    • Low network traffic: 1
    • High network traffic: 2 - 4

CIG MCW, Boulder, CO

using robotic tape archival resources
Using Robotic-Tape Archival Resources
  • NCSA Mass Storage System (MSS)
    • Accessible using GridFTP to mss.ncsa.teragrid.org
  • TACC SGI Data Migration Facility (DMF)
    • Accessible by simply placing files in $ARCHIVE directory
  • SDSC HPSS archival storage system
    • Use HSI from SDSC cluster only
  • PSC “Golem”
    • Accessible using GridFTP to

tg-gridftp.psc.teragrid.org

CIG MCW, Boulder, CO

using robotic tape archival resources32
Using Robotic-Tape Archival Resources
  • Files on these machines are transferred to their local disks, but may be automatically migrated to tape if necessary.
  • If you access a file that has been migrated to tape, it will be retrieved automatically, but expect some delay (up to a few minutes)
  • Storage capacity is essentially infinite!

CIG MCW, Boulder, CO

hands on exercise 7 pg 1
Hands-on: Exercise 7 – pg 1

Copy several 9 MByte files from your home directory at TACC to the NCSA Mass Storage System. Use 3rd party transfer at TACC.

GSISSH from NCSA to TACC:

tg-login> gsissh tg-login.tacc.teragrid.org

Start uberftp session:

lonestar> uberftp

Establish “local” connection to TACC dedicated GridFTP server:

uberftp> lopen tg-gridftp.tacc.teragrid.org

220 lonestar GridFTP..blah..blah..ready.

230 User xxx logged in.

Establish “local” connection to TACC dedicated GridFTP server:

uberftp> open tg-gridftp.tacc.teragrid.org

%%%%%Lots of Stuff%%%%%%%

230 User xxx logged in.

CIG MCW, Boulder, CO

hands on exercise 7 pg 2
Hands-on: Exercise 7 – pg 2

Put multiple files to NCSA MSS:

uberftp> mput test.file*

src> 150 Opening BINARY mode data connection for test file...

dst> 150 Opening BINARY mode data connection for test file...

src> 226 Transfer complete.

dst> 226 Transfer complete.

. . .

CIG MCW, Boulder, CO

hands on exercise 7 pg 3

File is on disk. AR used to indicate file on tape.

Hands-on: Exercise 7 – pg 3

Get a listing of the Mass Storage System directory:

uberftp> ls

-rw---- user group DK common 9621728 date test.file-Ex1

-rw---- user group DK common 9621728 date test.file-Ex2

-rw---- user group DK common 9621728 date test.file-Ex3

. . .

Quit uberftp:

uberftp> quit

CIG MCW, Boulder, CO

using psc golem
Using PSC “Golem”
  • tg-gridftp.psc.teragrid.org maps directly onto Golem’s filesystem.

Example:

tg-login1> globus-url-copy –tcp-bs 4000000 –p 4 gsiftp://tg-gridftp.ncsa.teragrid.org/`pwd`/test.file gsiftp://tg-gridftp.psc.teragrid.org/~/test.file

CIG MCW, Boulder, CO

using tacc dmf
Using TACC DMF
  • Simply copy files to $ARCHIVE directory
  • Files in this directory are automatically migrated to tape if necessary.
  • If you access a file that has been migrated to tape, it will be retrieved automatically, but expect some delay (up to a few minutes)
  • /archive/teragrid/username is visible from the login nodes, but not the TACC dedicated GridFTP servers.

CIG MCW, Boulder, CO

hands on wrapup
Hands-on: Wrapup

Logout of TACC gsissh session:

lonestar> exit

Destroy your proxy:

tg-login> grid-proxy-destroy

Logout of NCSA ssh session:

tg-login> exit

CIG MCW, Boulder, CO

data transfer summary
Data Transfer Summary
  • GridFTP clients tgcp, globus-url-copy and uberftp can be used to perform transfers between many TeraGrid online filesystems and mass storage systems accessible via GridFTP servers.
  • Users responsible for managing data transfers, including job-related data movement which can be incorporated into job scripts.
  • Choose servers, filesystems, and transfer parameters wisely to optimize performance.
  • Ongoing efforts to improve rates and usability.

CIG MCW, Boulder, CO

useful urls for help
Useful URLs for help
  • TeraGrid user information overview
    • http://www.teragrid.org/userinfo/index.html
  • Summary of TG Resources
    • http://www.teragrid.org/userinfo/guide_hardware_table.html
  • Summary of machines with links to site-specific user guides(just click on the name of each site)
    • http://www.teragrid.org/userinfo/guide_hardware_specs.html
  • Data Transfer guide
    • http://www.teragrid.org/userinfo/guide_data_transfer.html
  • Archival Storage guide
    • http://www.teragrid.org/userinfo/guide_data_storage.html#archival

CIG MCW, Boulder, CO