1 / 21

Managing and Scheduling Data Placement (DaP) Requests

Managing and Scheduling Data Placement (DaP) Requests. Outline. Motivation DaP Scheduler Case Study: DAGMan Conclusions. Demand for Storage. Applications require access to larger and larger amounts of data Database systems Multimedia applications Scientific applications

selene
Download Presentation

Managing and Scheduling Data Placement (DaP) Requests

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Managing and Scheduling Data Placement (DaP) Requests

  2. Outline • Motivation • DaP Scheduler • Case Study: DAGMan • Conclusions

  3. Demand for Storage • Applications require access to larger and larger amounts of data • Database systems • Multimedia applications • Scientific applications • Eg. High Energy Physics & Computational Genomics • Currently terabytes soon petabytes of data

  4. Is Remote access good enough? • Huge amounts of data (mostly in tapes) • Large number of users • Distance / Low Bandwidth • Different platforms • Scalability and efficiency concerns => A middleware is required

  5. Two approaches • Move job/application to the data • Less common • Insufficient computational power on storage site • Not efficient • Does not scale • Move data to the job/application

  6. Huge tape library (terabytes) Move data to the Job WAN Local Storage Area (eg. Local Disk, NeST Server..) LAN Remote Staging Area Compute cluster

  7. Main Issues • 1. Insufficient local storage area • 2. CPU should not wait much for I/O • 3. Crash Recovery • 4. Different Platforms & Protocols • 5. Make it simple

  8. Data Placement Scheduler (DaPS) • Intelligently Manages and Schedules Data Placement (DaP) activities/jobs • What Condor is for computational jobs, DaPS means the same for DaP jobs • Just submit a bunch of DaP jobs and then relax..

  9. SRB Server SRM Server Accept Exec. Sched. DAPS Server DaPS Client DaPS Client Local Disk Req. Req. Req. Req. GridFTP Server GridFTP Server NeST Server DaPS Architecture Remote Local Get Queue Buffer Put Thirdparty transfer

  10. DaPS Client Interface • Command line: • dap_submit <submit file> • API: • dapclient_lib.a • dapclient_interface.h

  11. DaP jobs • Defined as ClassAds • Currently four types: • Reserve • Release • Transfer • Stage

  12. DaP Job ClassAds [ Type = Reserve; Server = nest://turkey.cs.wisc.edu; Size = 100MB; reservation_no = 1; …… ] [ Type = Transfer; Src_url = srb://ghidorac.sdsc.edu/kosart.condor/x.dat; Dst_url = nest://turkey.cs.wisc.edu/kosart/x.dat; reservation_no = 1; ...... ]

  13. Supported Protocols • Currently supported: • FTP • GridFTP • NeST (chirp) • SRB (Storage Resource Broker) • Very soon: • SRM (Storage Resource Manager) • GDMP (Grid Data Management Pilot)

  14. DAGMan DAGMan A Condor Job Queue B C A D Case Study: DAGMan .dag File

  15. Current DAG structure • All jobs are assumed to be computational jobs Job A Job C Job B Job D

  16. Current DAG structure • If data transfer to/from remote sites is required, this is performed via pre- and post-scripts attached to each job. Job A PRE Job B POST Job C Job D

  17. Reserve In & out Transfer in Job B Release in Transfer out Release out New DAG structure Add DaP jobs to the DAG structure PRE Job B POST

  18. New DAGMan Architecture .dag File DAGMan DAGMan A DaPS Job Queue Condor Job Queue X X A C B Y D

  19. Conclusion • More intelligent management of remote data transfer & staging • increase local storage utilization • maximize CPU throughput

  20. Future Work • Enhanced interaction with DAGMan • Data Level Management instead of File Level Management • Possible integration with Kangaroo to keep the network pipeline full

  21. Thank You for Listening &Questions • For more information • Drop by my office anytime • Room: 3361, Computer Science & Stats. Bldg. • Email to: • condor-admin@cs.wisc.edu

More Related