Parrot transparent user level middleware for data intensive computing
Download
1 / 25

Parrot: - PowerPoint PPT Presentation


  • 340 Views
  • Updated On :

Parrot: Transparent User-Level Middleware for Data-Intensive Computing. Douglas Thain Condor Project, University of Wisconsin Workshop on Adaptive Grid Middleware 28 September 2003. The Reality of the Grid. afwuhweiuhsdvxmndf (and then a miracle happens) P=NP.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Parrot:' - LeeJohn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Parrot transparent user level middleware for data intensive computing l.jpg

Parrot:Transparent User-Level Middlewarefor Data-Intensive Computing

Douglas Thain

Condor Project, University of Wisconsin

Workshop on Adaptive Grid Middleware

28 September 2003


The reality of the grid l.jpg
The Reality of the Grid

afwuhweiuhsdvxmndf

(and then a miracle happens)

P=NP

I think you have a problem here...

Look at my new proof!


Slide3 l.jpg

User’s

App

(open, close, read, write, lseek)

I/O Interface

Storage

Server

Local Operating System

access

data

Chirp

FTP

NeST

RFIO

DCAP

Condor

PBS

NQE

LSF

Load

Leveler

run this

batch job

Local Operating System

Process Interface

(main, exit, abort, kill, sleep)

User’s

App

Parrot


Applications of parrot l.jpg
Applications of Parrot

  • Interactive Browsing

    • tcsh, tar, gzip, make, acroread, gv, xv...

  • Improved Reliability

    • Transparent retry/reassignment/reallocation

    • Files, sockets, even repair broken apps.

  • Private Namespaces

    • Make /home/thain appear the same everywhere.

    • Make /usr/data/calibration different everywhere.

  • Dynamic/Distributed Program Construction

    • Remote link, remote exec, remote eval...

  • Profiling and Debugging

    • Users may not know low-level I/O patterns.


Challenges l.jpg
Challenges

  • Technical Methods of Interposition

  • Semantic Differences

  • Error Management

  • CPU – I/O Integration

  • Performance

  • The butterfly effect:

    • Subtle underlying differences can have large effects in performance and usability.


Internal techniques l.jpg
Internal Techniques

Binary Rewriting

Polymorphic Extension

App Code

App Code

Standard Library

Library

M1

M2

NEW

New Code

App Code

New Library

Standard Library

Static or Dynamic

Re-Linking


External techniques l.jpg
External Techniques

Debugger Trap

Remote Filesystem

App

App

Agent

Kernel

Kernel Callout

Kernel

NFS

LFS

FFS

App

Agent

NFS

LFS

FFS

agent

Kernel

NFS

LFS

USR



Hole detection matters l.jpg
Hole Detection Matters

  • Dynamic Linking

    • Bypass Toolkit, ca. 2000

    • Works with some standard tools.

    • Many still crash in strange ways.

    • Doesn’t apply to static exes; always a surprise.

  • Debugger Trap

    • Parrot: Coding began in May of 2003.

    • Works reliably with almost everything in /usr/bin.

    • Caveat #1: Twice as much code

    • Caveat #2: Higher latency


Debugger trap l.jpg
Debugger Trap

  • For the rest of this talk, we select the debugger trap for completeness and reliability. Much of the discussion still applies to the other techniques too.

  • Some technical details in the paper:

    • Only on Linux.

    • Must manage process ancestry.

    • Must fudge some broken ptrace behavior.

    • Cannot write directly to process, must take roundabout path through temp file.


Slide11 l.jpg

User

Process

SYS_write

SYS_read

SYS_open

(debugger trap)

parrot_read

parrot_open

parrot_write

File

Descr.

0

1

2

3

4

5

6

7

8

9

...

name resolver

File

Pointers

pos:

100

pos:

0

pos:

0

pos:

1 MB

pos:

42

mount

list

driver

chirp

lookup

driver

File

Objects

“outfile”

“infile”

“config”

“data”

Local

Driver

Chirp

Driver

FTP

Driver

NeST

Driver

RFIO

Driver

DCAP

Driver

Device

Drivers


Adaptation l.jpg
Adaptation

On distant host:

On nearby host:

/mydata

-> /ftp/host2/opt/DAT

/mydata

->/chirp/host1/usr/mydata

App

App

open(“/mydata/foo”)

open(“/mydata/foo”)

Parrot

Parrot

Local

Chirp

Local

FTP

Chirp

FTP

chirpd

ftpd

/opt/DAT

On same host:

/mydata

-> /usr/data

App

open(“/mydata/foo”)

Parrot

Local

FTP

Chirp

/usr/data


What protocol l.jpg
What Protocol?

  • File Transfer Protocol:

    • Internet standard, many implementations.

    • High bandwidth sequential access.

  • NeST

    • General purpose storage appliance from UW.

    • Virtual users, namespace, and allocation.

  • RFIO:

    • Remote I/O protocol used with CERN CASTOR.

    • UNIX like, most ops require a new TCP.

  • DCAP

    • Remote I/O protocol used with Fermi D-Cache

    • UNIX like, WORM semantics, no directories, caching/

  • Chirp:

    • Protocol developed @ UW for Parrot.

    • Corresponds very closely to UNIX, incl errnos.


Small details matter l.jpg
Small Details Matter

  • Standard tools need to know subtle details, otherwise, they break:

    • ls –lR performs getdents(“foo”)

    • on success: descend

    • on ENOTDIR: display and continue

    • on ENOENT: display error and stop.

  • FTP does not provide this detail

    • Failed LIST -> error 550

    • Failed GET -> error 550

    • Failed CDIR -> error 550

  • Simple assignment doesn’t work:

    • Making 550=ENOENT breaks many tools.


Example solution l.jpg
Example Solution

LIST “foo”

200

Success

other

550

CWD “foo”

Transient Error

550

other

Not a dir.

200

SIZE “foo”

other

200

Access denied.

No such entry.

550


Cpu io integration l.jpg
CPU-IO Integration

  • Errors that cannot be expressed in the client’s interface must be passed to a higher level (the batch system.)

  • Simple options:

    • kill –9 application (retry app elsewhere)

    • exit(1) application (don’t retry app)

  • Complex options: (Condor only)

    • restart with (Subnet!=“128.101.175”)

    • restart with (CurrentTime>5pm)


Bandwidth by protocol l.jpg
Bandwidth by Protocol

(unix default hint)

(parrot default hint)



Andrew like benchmark l.jpg
Andrew-Like Benchmark

  • Original Andrew benchmark is no longer appropriate, so replace with the Parrot source: 296 files, 955 KB.

  • Copy the source to a remote device, then manipulate in five stages:

    • copy: cp –rp

    • list: ls –lR

    • scan: grep searchstring –r *

    • make: make

    • delete: rm –rf *






Moral of the story l.jpg
Moral of the story:

  • The butterfly effect: Small underlying differences can have big effects on performance and reliability.

  • Examples in interposition:

    • Dynamic linking: fast but poor hole detection.

    • Debugger trap: slow but good hold detection.

  • Examples in protocols:

    • Chirp: UNIX semantics restrict bandwidth.

    • FTP: Need for multiple ops increases latency.

    • NeST: Powerful virtualization increases latency.

    • RFIO: Connection per op doesn’t scale.


For more info l.jpg
For more info...

  • Douglas Thain

  • Miron Livny

  • Software, manuals, more info:

    • http://www.cs.wisc.edu/condor/parrot

  • The Condor Project:

    • http://www.cs.wisc.edu/condor


ad