using application structure to handle failures and improve performance in a migratory file service n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
John Bent, Douglas Thain, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny PowerPoint Presentation
Download Presentation
John Bent, Douglas Thain, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny

Loading in 2 Seconds...

play fullscreen
1 / 38

John Bent, Douglas Thain, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

Using Application Structure to Handle Failures and Improve Performance in a Migratory File Service. John Bent, Douglas Thain, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny WiND and Condor Project 14 April 2003. Disclaimer. We have a lot of stuff to describe,

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'John Bent, Douglas Thain, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny' - jelani-rios


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
using application structure to handle failures and improve performance in a migratory file service

Using Application Structureto Handle Failuresand Improve Performancein a Migratory File Service

John Bent, Douglas Thain, Andrea Arpaci-Dusseau,

Remzi Arpaci-Dusseau, and Miron Livny

WiND and Condor Project

14 April 2003

disclaimer
Disclaimer

We have a lot of stuff to describe,

so hang in there until the end!

outline
Outline
  • Data Intensive Applications
    • Batch and Pipeline Sharing
    • Example: AMANDA
  • Hawk: A Migratory File Service
    • Application Structure
    • System Architecture
    • Interactions
  • Evaluation
    • Performance
    • Failure
  • Philosophizing
cpu bound
CPU Bound
  • SETI@Home, Folding@Home, etc...
    • Excellent application of dist comp.
    • KB of data, days of CPU time.
    • Efficient to do tiny I/O on demand.
  • Supporting Systems:
    • Condor
    • BOINC
    • Google Toolbar
    • Custom software.
i o bound
I/O Bound
  • D-Zero data analysis:
    • Excellent app for cluster computing.
    • GB of data, seconds of CPU time.
    • Efficient to compute whenever data is ready.
  • Supporting Systems:
    • Fermi SAM
    • High-throughput document scanning
    • Custom software.
batch pipelined applications
Batch Pipelined Applications

a1

a2

a3

Pipeline

Shared

Data

x

y

z

x

y

z

x

y

z

b1

b2

b3

Pipeline

Batch

Shared

Data

data

data

data

data

c1

c2

c3

Batch Width

example amanda
Example: AMANDA

corsika_input.txt

(4 KB)

mmc

ice tables

(3 files, 3MB)

NUCNUCCS

GLAUBTAR

EGSDATA3.3

QGSDATA4

(1 MB)

corsika

mmc_output.dat

(126 MB)

amasim_input.dat

DAT

(23 MB)

corama

amasim

expt geometry

(100s files, 500 MB)

corama.out

(26 MB)

mmc_input.txt

amasim_output.txt

(5MB)

computing evironment
Computing Evironment
  • Clusters dominate:
    • Similar configurations.
    • Fast interconnects.
    • Single administrative domain.
    • Underutilized commodity storage.
    • En masse, quite unreliable.
  • Users wish to harness multiple clusters, but have jobs that are both I/O and CPU intensive.
ugly solutions
Ugly Solutions
  • “FTP-Net”
    • User finds remote clusters.
    • Manually stages data in.
    • Submits jobs, deals with failures.
    • Pulls data out.
    • Lather, rinse, repeat.
  • “Remote I/O”
    • Submit jobs to a remote batch system.
    • Let all I/O come back to the archive.
    • Return in several decades.
what we really need
What We Really Need
  • Access resources outside my domain.
    • Assemble your own army.
  • Automatic integration of CPU and I/O access.
    • Forget optimal: save administration costs.
    • Replacing remote with local always wins.
  • Robustness to failures.
    • Can’t hire babysitters for New Year’s Eve.
hawk a migratory file service
Hawk: A Migratory File Service
  • Automatically deploys a “task force” acorss an existing distributed system.
  • Manages applications from a high level, using knowledge of process interactions.
  • Provides dependable performance through peer-to-peer techniques.
  • Understands and reacts to failures using knowledge of the system and workloads.
philsophy of hawk
Philsophy of Hawk

“In allocating resources, strive to avoid disaster, rather than attempt to obtain an optimum.” - Butler Lampson

why not afs make
Why not AFS+Make?
  • Quick answer:
    • Distributed filesystems provide an unnecessarily strong abstraction that is unacceptably expensive to provide in the wide area.
  • Better answer after we explain what Hawk is and how it works.
outline1
Outline
  • Data Intensive Applications
    • Batch and Pipeline Sharing
    • Example: AMANDA
  • Hawk: A Migratory File Service
    • Application Structure
    • System Architecture
    • Interactions
  • Evaluation
    • Performance
    • Failure
  • Philosophizing
workflow language 1
Workflow Language 1

job a a.sub

job b b.sub

job c c.sub

job d d.sub

parent a child c

parent b child d

a

b

c

d

workflow language 2
Workflow Language 2

v1

Home Storage

mydata

volume v1 ftp://home/mydata

mount v1 a /data

mount v1 b /data

volume v2 scratch

mount v2 a /tmp

mount v2 c /tmp

volume v3 scratch

mount v3 b /tmp

mount v3 d /tmp

v2

v3

a

b

c

d

workflow language 3
Workflow Language 3

v1

Home Storage

mydata

extract v2 x ftp://home/out.1

extract v3 x ftp://home/out.2

v2

v3

a

b

x

x

c

d

out.1

out.2

mapping logical to physical
Mapping Logical to Physical
  • Abstract Jobs
    • Physical jobs in a batch system
    • May run more than once!
  • Logical “scratch” volumes
    • Temporary containers on a scratch disk.
    • May be created, replicated, and destroyed.
  • Logical “read” volumes
    • Striped across cooperative proxy caches.
    • May be created, cached, and evicted.
starting system
Starting System

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

PBS Head Node

Condor Pool

Archive

Match

Maker

Batch

Queue

Workflow

Manager

gliding in
Gliding In

Master

Master

Master

Master

Master

Master

Proxy

Proxy

Proxy

Proxy

Proxy

Proxy

StartD

StartD

StartD

StartD

StartD

StartD

Glide-In

Job

Node

Node

Node

Node

Node

Node

Node

Node

Node

Node

PBS Head Node

Condor Pool

Archive

Match

Maker

Batch

Queue

hawk architecture
Hawk Architecture

Job

Job

Job

App

Flow

Agent

Agent

Agent

Coop

Cache

Coop

Cache

System

Model

Wide Area Caching

StartD

StartD

StartD

Proxy

Proxy

Proxy

Archive

Match

Maker

Batch

Queue

Workflow

Manager

i o interactions
I/O Interactions

StartD

Job

creat(“/tmp/outfile”);

open(“/data/d15”);

POSIX Library Interface

Agent

/tmp container://host5/120

/data cache://host5/archive/data

Local Area Network

Proxy

Cont. 119

Cont. 120

Cooperative

Block

Cache

Other

Proxies

foo

tmpfile

outfile

bar

baz

Archive

Match

Maker

Batch

Queue

Workflow

Manager

cooperative proxies
Cooperative Proxies

C

t1:

C

B

t2:

C

B

A

t3:

C

C

B

t4:

C

Discover

Discover

Discover

C

Hash Map

Paths -> Proxies

StartD

StartD

StartD

Job

Job

Job

Agent

Agent

Agent

Proxy

A

Proxy

B

Proxy

C

Archive

Match

Maker

Batch

Queue

Workflow

Manager

summary
Summary
  • Archive
    • Sources input data, chooses coordinator.
  • Glide-In
    • Deploy a “task force” of components.
  • Cooperative Proxies
    • Provide dependable batch read-only data.
  • Data Containers
    • Fault-isolated pipeline data.
  • Workflow Manager
    • Directs the operation.
outline2
Outline
  • Data Intensive Applications
    • Batch and Pipeline Sharing
    • Example: AMANDA
  • Hawk: A Migratory File Service
    • Application Structure
    • System Architecture
    • Interactions
  • Evaluation
    • Performance
    • Failure
  • Philosophizing
performance testbed
Performance Testbed
  • Controlled testbed:
    • 32 550 MHZ dual-cpu cluster machines, 1 GB, SCSI disks, 100Mb/s ethernet.
    • Simulated WAN: restrict archive storage across router to 800 KB/s.
  • Also some preliminary tests on uncontrolled systems:
    • MFS over PBS cluster at Los Alamos
    • MFS over Condor system at INFN Italy.
synthetic apps
Synthetic Apps

Pipe Intensive

Mixed

Batch Intensive

a

a

a

10 MB

pipe

10 MB

batch

5 MB

pipe

5 MB

batch

b

b

b

System Configurations

real applications
Real Applications
  • BLAST
    • Search tool for proteins and nucleotides in genomic databases.
  • CMS
    • Simulation of a high energy physics expt to begin operation at CERN in 2006.
  • H-F
    • Simulation of the non relativistic interactions between nuclei and electrons
  • AMANDA
    • Simulation of a neutrino detector buried in the ice of the South Pole.
outline3
Outline
  • Data Intensive Applications
    • Batch and Pipeline Sharing
    • Example: AMANDA
  • Hawk: A Migratory File Service
    • Application Structure
    • System Architecture
    • Interactions
  • Evaluation
    • Performance
    • Failure
  • Philosophizing
related work
Related Work
  • Workflow management
  • Dependency managers: TREC, make
  • Private namespaces: UFO, db views
  • Cooperative caching: no writes.
  • P2P systems: wrong semantics.
  • Filesystems: overly strong
why not afs make1
Why Not AFS+Make?
  • Namespaces
    • Constructed per-process at submit-time
  • Consistency
    • Enforced at the workflow level
  • Selective Commit
    • Everything tossed unless explicitly saved.
  • Fault Awareness
    • CPUs and data can be lost at any point.
  • Practicality
    • No special permission required.
conclusions
Conclusions
  • Traditional systems build from the bottom up: this disk must have five nines, or we’re in big trouble!
  • MFS builds from the top down: application semantics drive system structure.
  • By posing the right problem, we solve the traditional hard problems of file systems.
for more info
For More Info...
  • Paper in progress...
  • Application study:
    • “Pipeline and Batch Sharing in Grid Workloads”, to appear in HPDC-2003.
    • www.cs.wisc.edu/condor/doc/profiling.ps
  • Talk to us!
  • Questions now?