1 / 70

Reliable and Efficient Data Placement in a Grid Environment

Reliable and Efficient Data Placement in a Grid Environment. PhD Research Summary Tevfik Kosar IBM TJ Watson Research Center June 22 nd , 2004. Grid Computing. “Distributed computing across networks using open standards supporting heterogeneous resources” - IBM.

fia
Download Presentation

Reliable and Efficient Data Placement in a Grid Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliable and Efficient Data Placement in a Grid Environment PhD Research Summary Tevfik Kosar IBM TJ Watson Research Center June 22nd, 2004

  2. Grid Computing “Distributed computing across networks using open standards supporting heterogeneous resources” - IBM Reliable and Efficient Data Placement in a Grid Environment

  3. Motivations for Grid Computing • Increase Capacity • Improve Efficiency / Reduce Costs • Reduce “Time to Results” • Provide Reliability / Availability • Support Heterogeneous systems • Enable Collaborations … Reliable and Efficient Data Placement in a Grid Environment

  4. Future of Grid “Grid is hot because it's the right technology for its time and within the next five years it will be a de facto part and parcel of virtually every major financial markets firm's infrastructure..” ` - Grid Computing in Financial Markets: Moving Beyond Compute Intensive Applications, Tabb Group Reliable and Efficient Data Placement in a Grid Environment

  5. Moving Beyond Compute-Intensive Applications “While the compute-intensive segment is growing, the vast amount of new grid growth will not come from compute-intensive solutions, but from data and service grids whose application we believe to be much wider than traditional compute grids.” ` - Grid Computing in Financial Markets: Moving Beyond Compute Intensive Applications, Tabb Group Reliable and Efficient Data Placement in a Grid Environment

  6. What about Science? • Genomic information processing applications • Biomedical Informatics Research Network (BIRN) applications • Cosmology applications (MADCAP) • Methods for modeling large molecular systems • Coupled climate modeling applications • Real-time observatories, applications, and data-management (ROADNet) Reliable and Efficient Data Placement in a Grid Environment

  7. Some Remarkable Numbers Characteristics of four physics experiments targeted by GriPhyN: Source: GriPhyN Proposal, 2000 Reliable and Efficient Data Placement in a Grid Environment

  8. Even More Remarkable… “ ..the data volume of CMS is expected to subsequently increase rapidly, so that the accumulated data volume will reach 1 Exabyte (1 million Terabytes) by around 2015.” Source: PPDG Deliverables to CMS Reliable and Efficient Data Placement in a Grid Environment

  9. Access to Remote Data • Remote I/O • Move application close to data • Move data close to application • Move both data and application Reliable and Efficient Data Placement in a Grid Environment

  10. Access to Remote Data • Remote I/O • Move application close to data • Move data close to application • Move both data and application • Remote I/O does not scale well for large data sets! Reliable and Efficient Data Placement in a Grid Environment

  11. Access to Remote Data • Remote I/O • Move application close to data • Move data close to application • Move both data and application • Remote I/O does not scale well for large data sets! • Storage sites do not always have sufficient computational power nearby! Reliable and Efficient Data Placement in a Grid Environment

  12. Need to move data around TB TB PB PB Reliable and Efficient Data Placement in a Grid Environment

  13. While doing this.. • Locate the data • Access heterogeneous resources • Face with all kinds of failures • Allocate and de-allocate storage • Move the data • Clean-up everything All of these need to be done reliably and efficiently! Reliable and Efficient Data Placement in a Grid Environment

  14. Goal • Data placement is crucial in a Grid environment. • Current approaches regard it as a side affect of computation. • Data placement must be regarded as a first class citizen in the Grid just like the computational jobs. Reliable and Efficient Data Placement in a Grid Environment

  15. Approach • Regard data placement activities as full fledged jobs. • Design and implement a system to reliably and efficiently schedule, execute, monitor, and manage them. Reliable and Efficient Data Placement in a Grid Environment

  16. Outline • Introduction • Background • The Concept • Data Placement Subsystem • Progress Made • Contributions • Future Work Reliable and Efficient Data Placement in a Grid Environment

  17. CPU MEMORY BUS HARDWARE LEVEL I/O PROCESSOR DMA CONTROLLER DISK Background Reliable and Efficient Data Placement in a Grid Environment

  18. I/O CONTROL SYSTEM CPU SCHEDULER I/O SCHEDULER CPU HARDWARE LEVEL BUS I/O PROCESSOR MEMORY CONTROLLER DMA DISK Background I/O SUBSYSTEM OPERATING SYSTEMS LEVEL Reliable and Efficient Data Placement in a Grid Environment

  19. BATCH SCHEDULERS CPU SCHEDULER OPERATING SYSTEMS LEVEL I/O SCHEDULER I/O CONTROL SYSTEM CPU HARDWARE LEVEL BUS I/O PROCESSOR MEMORY CONTROLLER DMA DISK Background DISTRIBUTED SYSTEMS LEVEL I/O SUBSYSTEM Reliable and Efficient Data Placement in a Grid Environment

  20. BATCH SCHEDULERS DATA PLACEMENT SUBSYSTEM CPU SCHEDULER OPERATING SYSTEMS LEVEL I/O SCHEDULER I/O CONTROL SYSTEM CPU HARDWARE LEVEL BUS I/O PROCESSOR MEMORY CONTROLLER DMA DISK Background DISTRIBUTED SYSTEMS LEVEL I/O SUBSYSTEM Reliable and Efficient Data Placement in a Grid Environment

  21. Outline • Introduction • Background • The Concept • Data Placement Subsystem • Progress Made • Contributions • Future Work Reliable and Efficient Data Placement in a Grid Environment

  22. Stage-in • Execute the Job • Stage-out Individual Jobs The Concept Reliable and Efficient Data Placement in a Grid Environment

  23. Stage-in • Execute the Job • Stage-out Stage-in Execute the job Stage-out Release input space Release output space Allocate space for input & output data Individual Jobs The Concept Reliable and Efficient Data Placement in a Grid Environment

  24. Traditional Schedulers • Not aware of characteristics and semantics of data placement jobs Executable = /tmp/foo.exe Arguments = a b c d Executable = globus-url-copy Arguments = gsiftp://host1/f1 . gsiftp://host2/f2 Any difference? Reliable and Efficient Data Placement in a Grid Environment

  25. Understanding Job Characteristics & Semantics • Job_type = transfer, reserve, release? • Source and destination hosts, files, protocols to use? • Determine concurrency level • Can select alternate protocols • Can select alternate routes • Can tune network parameters (tcp buffer size, I/O block size, # of parallel streams) • … Reliable and Efficient Data Placement in a Grid Environment

  26. Stage-in • Execute the Job • Stage-out Stage-in Execute the job Stage-out Release input space Release output space Allocate space for input & output data Individual Jobs The Concept Reliable and Efficient Data Placement in a Grid Environment

  27. Stage-in • Execute the Job • Stage-out Stage-in Execute the job Stage-out Release input space Release output space Allocate space for input & output data Data Placement Jobs Computational Jobs The Concept Reliable and Efficient Data Placement in a Grid Environment

  28. Outline • Introduction • Background • The Concept • Data Placement Subsystem • Progress Made • Contributions • Future Work Reliable and Efficient Data Placement in a Grid Environment

  29. USER JOB DESCRIPTIONS

  30. PLANNER USER JOB DESCRIPTIONS

  31. PLANNER COMPUTE NODES STORAGE SYSTEMS USER JOB DESCRIPTIONS DATA PLACEMENT SCHEDULER COMPUTATION SCHEDULER

  32. PLANNER STORAGE SYSTEMS USER JOB DESCRIPTIONS RESOURCE BROKER/ POLICY ENFORCER DATA PLACEMENT SCHEDULER COMPUTATION SCHEDULER RESOURCES C. JOB LOG FILES D. JOB LOG FILES

  33. PLANNER STORAGE SYSTEMS USER JOB DESCRIPTIONS RESOURCE BROKER/ POLICY ENFORCER DATA PLACEMENT SCHEDULER COMPUTATION SCHEDULER RESOURCES C. JOB LOG FILES D. JOB LOG FILES DATA MINER NETWORK MONITORING TOOLS FEEDBACK MECHANISM

  34. PLANNER STORAGE SYSTEMS USER JOB DESCRIPTIONS RESOURCE BROKER/ POLICY ENFORCER DATA PLACEMENT SCHEDULER COMPUTATION SCHEDULER RESOURCES C. JOB LOG FILES D. JOB LOG FILES DATA MINER NETWORK MONITORING TOOLS FEEDBACK MECHANISM

  35. PLANNER STORAGE SYSTEMS USER JOB DESCRIPTIONS RESOURCE BROKER/ POLICY ENFORCER DATA PLACEMENT SCHEDULER COMPUTATION SCHEDULER DATA PLACEMENT SUBSYSTEM RESOURCES C. JOB LOG FILES D. JOB LOG FILES DATA MINER NETWORK MONITORING TOOLS FEEDBACK MECHANISM

  36. Outline • Background • Related Work • The Concept • Data Placement Subsystem • Progress Made • Contributions • Future Work Reliable and Efficient Data Placement in a Grid Environment

  37. PLANNER STORAGE SYSTEMS Implemented USER JOB DESCRIPTIONS RESOURCE BROKER/ POLICY ENFORCER DATA PLACEMENT SCHEDULER COMPUTATION SCHEDULER DATA PLACEMENT SUBSYSTEM RESOURCES C. JOB LOG FILES D. JOB LOG FILES DATA MINER NETWORK MONITORING TOOLS FEEDBACK MECHANISM

  38. Separation of Jobs DaP A A.submit DaP B B.submit Job C C.submit ….. ParentA child B Parent B child C Parent C child D, E ….. DAG specification Reliable and Efficient Data Placement in a Grid Environment

  39. A B D E F Separation of Jobs DaP A A.submit DaP B B.submit Job C C.submit ….. Parent A child B Parent B child C Parent C child D, E ….. DAG specification Workflow Manager C Reliable and Efficient Data Placement in a Grid Environment

  40. A B D E F Separation of Jobs Compute Job Queue DaP A A.submit DaP B B.submit Job C C.submit ….. Parent A child B Parent B child C Parent C child D, E ….. DAG specification C Workflow Manager DaP Job Queue C E Reliable and Efficient Data Placement in a Grid Environment

  41. A B D E F Separation of Jobs Condor Job Queue DaP A A.submit DaP B B.submit Job C C.submit ….. Parent A child B Parent B child C Parent C child D, E ….. DAG specification C DAGMan Stork Job Queue C E Reliable and Efficient Data Placement in a Grid Environment

  42. Stork: Data Placement Scheduler • Most important component of the data placement subsystem. • Understands the characteristics and semantics of data placement jobs. • Can make smart scheduling decisions for reliable and efficient data placement. Reliable and Efficient Data Placement in a Grid Environment

  43. Support for Heterogeneity Protocol translation usingStork memory buffer. Reliable and Efficient Data Placement in a Grid Environment

  44. Support for Heterogeneity Protocol translation using Stork Disk Cache. Reliable and Efficient Data Placement in a Grid Environment

  45. Flexible Job Representation and Multilevel Policy Support [ Type = “Transfer”; Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”; …… …… Max_Retry = 10; Restart_in = “2 hours”; ] Reliable and Efficient Data Placement in a Grid Environment

  46. Run-time Adaptation • Dynamic protocol selection [ dap_type = “transfer”; src_url = “drouter://slic04.sdsc.edu/tmp/test.dat”; dest_url = “drouter://quest2.ncsa.uiuc.edu/tmp/test.dat”; alt_protocols = “nest-nest, gsiftp-gsiftp”; ] [ dap_type = “transfer”; src_url = “any://slic04.sdsc.edu/tmp/test.dat”; dest_url = “any://quest2.ncsa.uiuc.edu/tmp/test.dat”; ] Reliable and Efficient Data Placement in a Grid Environment

  47. Run-time Adaptation -2 • Run-time Protocol Auto-tuning [ link = “slic04.sdsc.edu – quest2.ncsa.uiuc.edu”; protocol = “gsiftp”; bs = 1024KB; //block size tcp_bs = 1024KB; //TCP buffer size p = 4; ] Reliable and Efficient Data Placement in a Grid Environment

  48. Failure Recovery and Efficient Resource Utilization • Fault tolerance • Just submit a bunch of data placement jobs, and then go away.. • Control number of concurrent transfers from/to any storage system • Prevents overloading • Space allocation and De-allocations • Make sure space is available Reliable and Efficient Data Placement in a Grid Environment

  49. Case Study -I Reliable and Efficient Data Placement in a Grid Environment

More Related