Www.inter-mezzo.org
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

A new Distributed File System Peter J. Braam, [email protected] PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on
  • Presentation posted in: General

www.inter-mezzo.org. A new Distributed File System Peter J. Braam, [email protected] Carnegie Mellon University & Stelias Computing. Overview. Joint work with Michael Callahan & Phil Schwan Distributed file systems protocols, semantics, usage patterns InterMezzo

Download Presentation

A new Distributed File System Peter J. Braam, [email protected]

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A new distributed file system peter j braam braam cs cmu

www.inter-mezzo.org

A new Distributed File System

Peter J. Braam, [email protected]

Carnegie Mellon University & Stelias Computing

InterMezzo, PJ Braam, CMU


Overview

Overview

  • Joint work with

    • Michael Callahan & Phil Schwan

  • Distributed file systems

    • protocols, semantics, usage patterns

  • InterMezzo

    • purpose, design, implementation

  • Project plans

InterMezzo, PJ Braam, CMU


Distributed file systems

Distributed File Systems

InterMezzo, PJ Braam, CMU


Distributed file systems1

Distributed File Systems

  • Purpose: make remote files behave as if local

    • Clients: receivers of files, suppliers of updates

    • Servers: suppliers of files, receivers of updates

  • Challenges:

    • Semantics and protocols of sharing

    • Performance

    • Implementation and correctness

  • Newer features

    • disconnection, reconnections, server replication, validation and conflict resolution

InterMezzo, PJ Braam, CMU


Semantics

Semantics

  • Unix I/O model:

    • shared memory model

    • writes visible to readers immediately

    • last write wins

  • Network file systems

    • Weak semantics: “aging”, “timeout” (NFS, SMB)

    • Unix semantics: Sprite, DCE/DFS, XFS

    • New semantics: Coda/InterMezzo/AFS

InterMezzo, PJ Braam, CMU


Network semantics

Network Semantics

  • Propagate writes upon close: last close wins

  • Callbacks - guarantee currency

    • Client continues to use files until notified by server

    • No connected client ever sees stale data

    • Server maintains state

  • Permits/Tokens - guarantee exclusivity

    • Client propagates updates lazily until notified

    • Is major performance gain

    • Server maintains state

  • Validation after reconnecting:version stamps

InterMezzo, PJ Braam, CMU


Tradeoffs

Tradeoffs

  • No semantics:

    • works amazingly well (so does C++ & US Government)

  • Unix semantics:

    • well defined, must propagate writes

    • not suitable with modest bandwidth network

    • suitable for SAN file systems

  • Networksemantics:

    • optimal for lower bandwidth situations, scales well

    • fails with heavy write/write sharing

InterMezzo, PJ Braam, CMU


Our inspiration coda features

Our inspiration: Coda Features

  • disconnected operation

  • server replication

  • reintegration, resolution

  • bandwidth adaptation

  • good security model

  • write back caching

InterMezzo, PJ Braam, CMU


Performance

Performance

  • Synchronous = BAD

    • rpc’s take long

    • context switch to cache manager takes long

    • disk writes take long

  • InterMezzo

    • exploits good disk file systems

    • normal case: speed of local disk file system

    • gives kernel autonomy

    • does write back caching at kernel level

InterMezzo, PJ Braam, CMU


Intermezzo

InterMezzo

InterMezzo, PJ Braam, CMU


Intermezzo strategy

InterMezzo Strategy

  • Protocol

    • Retain much of Coda’s protocols and semantics

  • Performance & scalability:

    • leverage disk file systems for cache: filter driver

    • more kernel autonomy: kernel write back cache

  • Implementation:

    • make it SIMPLE

    • leverage existing code: TCP, diskfs, rsync

    • avoid threads: use async I/O with completions

InterMezzo, PJ Braam, CMU


Intermezzo overview

InterMezzo overview

Application

Lento (PERL):

Cache Manager &

Server

Update propagation

& fetching with

InterMezzo server

Syscall

User level

Kernel Level

Upcalls

mkdir...

create...

rmdir...

unlink...

link….

no

VFS

Filter: data fresh?

Presto

Local file system

Kernel Update Journal

Kernel modification

log

InterMezzo, PJ Braam, CMU


Example of kernel code

Example of kernel code

presto_file_open(struct dentry *de)

{

if ( IAMLENTO ) {

bottom_fops->open(de);

mark_dentry(de, HAVE_DATA);

return;

}

if ( !check_dentry(de, HAVE_DATA) {

lento_open_file(de);

}

rc = bottom->open(de);

if ( ! IAMLENTO )

journal(“open”, de->d_name);

return rc;

}

Cache

mgmt

Access

filter

Upcall

Write

back

caching

InterMezzo, PJ Braam, CMU


Overview of functionality

Overview of functionality

  • Keep folder collections replicas in sync

  • Disconnected operation & reintegration

InterMezzo, PJ Braam, CMU


A new distributed file system peter j braam braam cs cmu

Server

2. Reintegrate

mkdir...

create...

rmdir…

store...

3. Forward

mkdir...

create...

rmdir…

store...

Client 1

Client 2

Client 3

1. Modify folder

collection

4. Replicators synchronized

InterMezzo, PJ Braam, CMU


A new distributed file system peter j braam braam cs cmu

Client 1

1. Retain journals

for disconnected

replicators

store...

create...

rmdir…

store...

1. Journal disconnected modifications

store…

store…

create...

Server

  • 2. Reconnect

    • a.Server forwards modification journals

    • b. Handle conflicts

    • c. Reintegrate client journals

  • 3. Client and server synchronized

InterMezzo, PJ Braam, CMU


File service protocol

File Service Protocol

InterMezzo, PJ Braam, CMU


Client server protocol

Client Server Protocol

  • File Service:

    • FetchDir

    • FetchFile

  • Modification Service:

    • Reintegrate

  • Consistency:

    • GetPermit/BreakPermit

    • Validate/BreakCallback

InterMezzo, PJ Braam, CMU


Client server symmetry

Client/Server Symmetry

  • Typical use:

    • client A fetches files

    • serverfetches modified files from client A

    • client A sends modification log to server

    • serversends modification log to replicators

  • Code reuse: both need

    • Modification Log & Fileservice

    • Policy different on client and server

InterMezzo, PJ Braam, CMU


Intermezzo implementation

InterMezzo implementation

  • Coda & XFS experience

    • threading is complicated

    • distributed state & locking: hard to track

    • don’t implement your own cache

    • don’t accumulate 500,000 lines of C

  • Lento learn from other efforts:

    • Ericson, Teapot, XFS, ACE: async request processing

    • Completion routines & state machine

    • Verify protocol correctness with Murphy

    • High level language or framework

InterMezzo, PJ Braam, CMU


Blocking operations

Blocking operations

  • Disk & network I/O

  • Proactive Reactor:

    • start asynchronous operation

    • give continuation & context to reactor

    • reactor activates completion routine

  • Advantages:

    • avoid threading, locking

    • very concise code describing protocols

    • state localized

InterMezzo, PJ Braam, CMU


Perl for our prototype

PERL for our prototype

InterMezzo, PJ Braam, CMU


State machine approach

State Machine Approach

  • Introduce POE: Perl Object Environment

    • can dynamically create sessions

    • hand blocking operations to the POE kernel

    • sessions have:

      • parents

      • state on a heap (or inline, in object or class)

    • sessions do:

      • post events to other sessions

      • handle events posted to them

InterMezzo, PJ Braam, CMU


Example session fetchfile

Example session: fetchfile

Fetchfile = new session (

{ init => {

if (!have_attr) post(conn, fetch_attr, have_attr);

else post(conn, fetch_data, complete);

},

have_attr => {

if (status == success)

post(conn, fetch_data, complete)

else { destruct_session(error); }

},

new_filefetch => {

queue_event(this) ;

},

complete => {

reply_to_caller; handle_queue; destruct_session;

}, …...

} );

InterMezzo, PJ Braam, CMU


Wheels drivers filters

Wheels, drivers, filters

  • Wheels are modify sessions

    • exploit asynchronous drivers e.g.:

      • read/write

      • socketfactory (accept clients)

    • filters: deliver “whole” packets e.g.:

      • full request or data packets

      • unpacked kernel requests

    • when I/O completes:

      • post to static sessions … or …

      • create dynamic session as wheel output

InterMezzo, PJ Braam, CMU


Our wheels

Our wheels...

  • Wheels:

    • Upcall: kernel requests (unpack filter)

    • Packets: network rpc/data traffic (xdr filter)

    • SocketFactory: to accept new conns

  • Instantiate request handlers:

    • net requests

    • kernel upcall requests

InterMezzo, PJ Braam, CMU


Intermezzo wheels

InterMezzo Wheels

Timers

Upcall/Netreq Sessions

Kernel

Static

Dynamic

sessions

ReqDispatcher Session

packets

upcalls

connects

Sockets

Presto

SocketFactory

…………….Wheels …………..

InterMezzo, PJ Braam, CMU


Net request processing

Net Request Processing

request sessions

reply

data

endreq

enddata

req

got_error

reqdispatcher

got_error

req

acceptor(port)

- list of client sessions

- peer, port, etc.

_start

Connection

got_wheel

PacketWheel

SocketFactory

InterMezzo, PJ Braam, CMU


A new distributed file system peter j braam braam cs cmu

got_upcall

ReqDispatcher

UpcallWheel

_start

upcall sessions --> resolve paths to volumes & servers

new

req

reply

data

endreq

enddata

get_connection

got_connection

got_error

Server object:

- connector session

- volumes hosted there

UpcallProcessing

_start

connector(host, port)

- list of client sessions

- peer, port, etc.

_start

Connection

got_wheel

got_error

SocketFactory

PacketWheel

InterMezzo, PJ Braam, CMU


Project

Project

See: www.inter-mezzo.org

InterMezzo, PJ Braam, CMU


What we have done

What we have done

  • So far mostly Linux

  • 2,500 lines of C: Linux kernel code

  • 3,800 lines of Perl

  • went through 4 total rewrites!

  • Connected & disconnected: solid

  • Reintegration: mostly working

  • Usable, not many features yet

InterMezzo, PJ Braam, CMU


Principal targets

Principal targets

  • Focus on replication not general caching

    • scalable server replication

    • laptop/desk home directory synchronization

  • Clusters

    • install & administer one machine

    • use InterMezzo to manage all of them

InterMezzo, PJ Braam, CMU


Forthcoming features

Forthcoming features

  • Security

  • Conflict handling

  • Better admin tools

  • Cache manager in C

  • Variants with different semantics (locking, write sharing)

  • Windows clients (?)

InterMezzo, PJ Braam, CMU


  • Login