1 / 22

Explicit Control in a Batch-aware Distributed File System

This paper focuses on harnessing and managing remote storage in batch-pipelined and I/O intensive workloads, specifically in scientific and wide-area grid computing. The authors propose a solution called BAD-FS, a batch-aware distributed file system that leverages workload information with storage control to improve performance and simplify implementation.

lgary
Download Presentation

Explicit Control in a Batch-aware Distributed File System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Explicit Control in a Batch-aware Distributed File System

  2. Focus of work • Harnessing, managing remote storage • Batch-pipelined I/O intensive workloads • Scientific workloads • Wide-area grid computing

  3. Batch-pipelined workloads • General properties • Large number of processes • Process and data dependencies • I/O intensive • Different types of I/O • Endpoint • Batch • Pipeline

  4. Endpoint Endpoint Batch dataset Endpoint Pipeline Pipeline Batch dataset Batch-pipelined workloads Endpoint Endpoint Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Endpoint Endpoint Endpoint Endpoint

  5. Wide-area grid computing Internet Home storage

  6. Cluster-to-cluster (c2c) • Not quite p2p • More organized • Less hostile • More homogeneity • Correlated failures • Each cluster is autonomous • Run and managed by different entities • An obvious bottleneck is wide-area Internet Home store How to manage flow of data into, within and out of these clusters?

  7. Current approaches • Remote I/O • Condor standard universe • Very easy • Consistency through serialization • Prestaging • Condor vanilla universe • Manually intensive • Good performance through knowledge • Distributed file systems (AFS, NFS) • Easy to use, uniform name space • Impractical in this environment

  8. Pros and cons

  9. BAD-FS • Solution: Batch-Aware Distributed File System • Leverages workload info with storage control • Detail information about workload is known • Storage layer allows external control • External scheduler makes informed storage decisions • Combining information and control results in • Improved performance • More robust failure handling • Simplified implementation

  10. Practical and deployable • User-level; requires no privilege • Packaged as a modified Condor system • A Condor system which includes BAD-FS • General; glide-in works everywhere SGE SGE SGE SGE BAD- FS BAD- FS BAD- FS BAD- FS BAD- FS BAD- FS BAD- FS BAD- FS SGE SGE SGE SGE Internet Home store

  11. NeST NeST NeST NeST Jobqueue 1 2 3 4 BAD-FS == Condor ++ Compute node Compute node Compute node Compute node Condor startd Condor startd Condor Startd Condor startd BAD-FS BAD-FS BAD-FS 1) NeST storage management 3) Expanded Condor submit language 2) Batch-Aware Distributed File System 4) BAD-FS scheduler Job queue Home storage Condor DAGMan ++ Condor DAGMan

  12. BAD-FS knowledge • Remote cluster knowledge • Storage availability • Failure rates • Workload knowledge • Data type (batch, pipeline, or endpoint) • Data quantity • Job dependencies

  13. Control through lots • Abstraction that allows external storage control • Guaranteed storage allocations • Containers for job I/O • e.g. “I need 2 GB of space for at least 24 hours” • Scheduler • Creates lots to cache input data • Subsequent jobs can reuse this data • Creates lots to buffer output data • Destroys pipeline, copies endpoint • Configures workload to access lots

  14. Knowledge plus control • Enhanced performance • I/O scoping • Capacity-aware scheduling • Improved failure handling • Cost-benefit replication • Simplified implementation • No cache consistency protocol

  15. I/O scoping • Technique to minimize wide-area traffic • Allocate lots to cache batch data • Allocate lots for pipeline and endpoint • Extract endpoint • Cleanup Compute node Compute node AMANDA: 200 MB pipeline 500 MB batch 5 MB endpoint Steady-state: Only 5 of 705 MB traverse wide-area. Internet BAD-FS Scheduler

  16. Capacity-aware scheduling • Technique to avoid over-allocations • Scheduler has knowledge of • Storage availability • Storage usage within the workload • Scheduler runs as many jobs as fit • Avoids wasted utilizations • Improves job throughput

  17. Improved failure handling • Scheduler understands data semantics • Data is not just a collection of bytes • Losing data is not catastrophic • Output can be regenerated by rerunning jobs • Cost-benefit replication • Replicates only data whose replication cost is cheaper than cost to rerun the job • Can improve throughput in lossy environment

  18. Simplified implementation • Data dependencies known • Scheduler ensures proper ordering • Build a distributed file system • With cooperative caching • But without a cache consistency protocol

  19. Real workloads • AMANDA • Astrophysics study of cosmic events such as gamma-ray bursts • BLAST • Biology search for proteins within a genome • CMS • Physics simulation of large particle colliders • HF • Chemistry study of non-relativistic interactions between atomic nuclei and electrons • IBIS • Ecology global-scale simulation of earth’s climate used to study effects of human activity (e.g. global warming)

  20. Setup 16 jobs 16 compute nodes Emulated wide-area Configuration Remote I/O AFS-like with /tmp BAD-FS Result is order of magnitude improvement Real workload experience

  21. BAD Conclusions • Schedulers can obtain workload knowledge • Schedulers need storage control • Caching • Consistency • Replication • Combining this control with knowledge • Enhanced performance • Improved failure handling • Simplified implementation

  22. For more information “Pipeline and Batch Sharing in Grid Workloads,” Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dussea, Miron Livny. HPDC 12, 2003. • http://www.cs.wisc.edu/condor/publications.html • Questions? “Explicit Control in a Batch-Aware Distributed File System,” John Bent, Douglas Thain, Andrea Arpaci-Dusseau, Remzi Arpaci-Dussea, Miron Livny. NSDI ‘04, 2004.

More Related