1 / 23

GASS: A Data Movement and Access Service for Wide Area Computing Systems

GASS: A Data Movement and Access Service for Wide Area Computing Systems. Joseph Bester∗ Ian Foster∗† Carl Kesselman ‡ Jean Tedesco†∗ Steven Tuecke∗. 6. 5. Performance Studies. Application. Outline. 1. Introduction. 2. Background. 3. GASS Architecture. 4. GASS Implementation.

makala
Download Presentation

GASS: A Data Movement and Access Service for Wide Area Computing Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GASS: A Data Movement and Access Servicefor Wide Area Computing Systems Joseph Bester∗ Ian Foster∗†Carl Kesselman‡Jean Tedesco†∗ Steven Tuecke∗

  2. 6 5 Performance Studies Application Outline 1 Introduction 2 Background 3 GASS Architecture 4 GASS Implementation Your Site Here

  3. Requirement Easy modified. high-performance implementations and support application-oriented management Existing technology Web-based file system – transparent access. Condor – relinking to special I/O lib. RIO(Remote I/O system) – adopt MPI-IO parallel I/O lib. Strategy Optimized grid I/O pattern – fit the system. Without any specialized services - all resource can use. B.W. management - optimize performance. Not be known until runtime – you can prediction. Introduction Your Site Here

  4. I/O requirements in Grid applications Hierarchical Data Formant just access ‘nearest’. Regional model Access other input, e.g. topography. Diagnostic data stream back information. Output data must be stored somewhere Exploratory model Background(1/4) Your Site Here

  5. so GASS must be support… Uniform access to files. Diverse data sources – FTP,HTTP,tap,disk Dynamic resource set. Support for streaming I/O Because user’s habit use UNIX I/O. Little or no program modification. Support for programmer-directed performance optimization A override strategy for performance Background(2/4) Your Site Here

  6. Existing system Andrew File System (AFS) – kernel level DFS Prospero File System a DFS in heterogeneous but no addressing in performance Condor high-throughput computing system Link run-time lib replace to I/O system achieve transparent and small require, but no cache-manage. Background(3/4) Your Site Here

  7. Legion – next generation virtual computer Using specialized standard to copy into LS. some B.W. manage in this. WebFS and UFO- Web based data source Background(4/4) Your Site Here

  8. Access Patterns Default Data movement strategies Structure one: fetch and cache first read open inappropriate if a file is large: computation may be delayed too long while the file is transferred, or the local cache may be too small to hold the entire file. Structure two: flush cache and transfer on last write and close GASS Architecture Your Site Here

  9. Figure 1: The GASS system is optimized for I/O patterns (a){(c); patterns (d){(f) are not supported effciently Your Site Here

  10. Figure 2: The GASS cache architecture. Files opened by application processes (represented by circles) are maintained in a local cache directory; they are copied from the remote location (on open, if opened for reading) and/or to the remote location (on close, if created or opened for writing). Your Site Here

  11. Specialized Data movement strategies prestaging & poststaging - extension on cache manage [pre.] ‘open file for reading’ allocate cache -> move in it -> count++ [pos.] ‘close file that was open for writing’ Such as file staging and redirection of standard I/O Low-level cache – more fine-grained Benefit 1: allow file caching to be directed to specific locations, on a per-file and/or per-user basis: for example, to access-controlled user file systems. Benefit 2: exploit local DFS, e.g. DPSS,NFS,AFS GASS Architecture Your Site Here

  12. GASS operation Minimum changes to application Multi file path, performance high than DFS globus_gass_open(), globus_gass_close()globus_gass_fopen(), globus_gass_fclose() use URLs instead of filenames. Caches URL in case of multiple opens. Return descriptors to files in local cache or sockets to remote server. GASS Architecture Your Site Here

  13. no Modified Remove cache reference yes Upload changes globus_gass_open()/close() no URL in cache? Download File into cache yes open cached file,add cache reference globus_gass_close() globus_gass_open() Your Site Here

  14. How to Integration with Globus GRAM feature Allocate resources. Initial and manage computation. So GASS can extend on GARM Small overhead GASS Architecture Your Site Here

  15. APIs File Access API [synchronous]. Read from cache. Cache Management API [synchronous]. Support insertion, lock(reference counting), removal, allow overlapping multiple cache. Client Implementation API [asynchronous] allows applications to eliminate data copies select transfer unit size and proxy server. enable data transfer overlapped cache. Server Implementation API [asynchronous] remote file access protocol GASS Implementation Your Site Here

  16. Cache Management API Add, delete, maintain log(filename, local name, stamp, tag list). Against race, soif contention = block. notprovide any wide area cache coherency Client and Server APIs Nexus-based protocols can deliver superior performance to IP-based wide-area networking protocol. GASS Implementation Your Site Here

  17. Globus Executable Management System Assign .EXE, combine GASS client and caching API. GASS command Line Tools allow the programmer to implement pre-staging, post-staging, and other remote file operations without modifying user applications. Globus-rcp :Third-party-initiated data transfer remote start-up, p2p authentication Globusrun :auto task associated Cache avoid unnecessary download. provide arbitrary application. job management can schedule the linear time. MPICH_G = Message Passing Interface + globusrun (to initiate MPI program). Application Your Site Here

  18. SF-Express: Distributed Supercomputing. 100000 entry,1000 CPU. Require: locate, assemble, manage resources Result: Append-mode, redirect diagnostic If the GASS isn’t exist. 逐一與這些計算機的管理人員聯繫,定下時間,預留機器。 如何把SF-Express程序代碼及初始數據傳送到每台並行計算機上並啟動之?看來只好逐一登錄,用手工完成。 如果在程序運行過程中,出現異常情況時?SF Express只好停下來,找個機會重新開始。 Application Your Site Here

  19. GASS Cache Overhead Performance Studies Table 1: Time to transfer remote files of various sizes directly into memory (To Memory), through a /tmp file (No Cache),and through the GASS cache on /tmp (GASS Cache). All times are in seconds and are the average of multiple runs. See text for details. Your Site Here

  20. GASS Cache Contention Performance Studies Table 2: Results of contention experiments in which multiple processes open and read a file at the same time, via standard Unix open and close calls; GASS transfer followed by read; and GASS access to a prestaged file. All times are in seconds. See text for details. Your Site Here

  21. GASS and AFS Performance Performance Studies Table 3: Overall time required to read the content of a remote file using GASS and AFS to access the le. All times are in seconds. See text for details Your Site Here

  22. High performance Efficient in using B.W. Movement strategies with out modify code. Useful Suit to HPSS, DPSS, SRB. Future SRB interfacing into GASS. Globus = GASS + Advance reservation Conclusion Your Site Here

  23. Thank you! LOGO your site here

More Related