Www.inter-mezzo.org
Download
1 / 33

A new Distributed File System Peter J. Braam, [email protected] - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

www.inter-mezzo.org. A new Distributed File System Peter J. Braam, [email protected] Carnegie Mellon University & Stelias Computing. Overview. Joint work with Michael Callahan & Phil Schwan Distributed file systems protocols, semantics, usage patterns InterMezzo

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A new Distributed File System Peter J. Braam, [email protected]' - kaiyo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

www.inter-mezzo.org

A new Distributed File System

Peter J. Braam, [email protected]

Carnegie Mellon University & Stelias Computing

InterMezzo, PJ Braam, CMU


Overview
Overview

  • Joint work with

    • Michael Callahan & Phil Schwan

  • Distributed file systems

    • protocols, semantics, usage patterns

  • InterMezzo

    • purpose, design, implementation

  • Project plans

InterMezzo, PJ Braam, CMU


Distributed file systems
Distributed File Systems

InterMezzo, PJ Braam, CMU


Distributed file systems1
Distributed File Systems

  • Purpose: make remote files behave as if local

    • Clients: receivers of files, suppliers of updates

    • Servers: suppliers of files, receivers of updates

  • Challenges:

    • Semantics and protocols of sharing

    • Performance

    • Implementation and correctness

  • Newer features

    • disconnection, reconnections, server replication, validation and conflict resolution

InterMezzo, PJ Braam, CMU


Semantics
Semantics

  • Unix I/O model:

    • shared memory model

    • writes visible to readers immediately

    • last write wins

  • Network file systems

    • Weak semantics: “aging”, “timeout” (NFS, SMB)

    • Unix semantics: Sprite, DCE/DFS, XFS

    • New semantics: Coda/InterMezzo/AFS

InterMezzo, PJ Braam, CMU


Network semantics
Network Semantics

  • Propagate writes upon close: last close wins

  • Callbacks - guarantee currency

    • Client continues to use files until notified by server

    • No connected client ever sees stale data

    • Server maintains state

  • Permits/Tokens - guarantee exclusivity

    • Client propagates updates lazily until notified

    • Is major performance gain

    • Server maintains state

  • Validation after reconnecting:version stamps

InterMezzo, PJ Braam, CMU


Tradeoffs
Tradeoffs

  • No semantics:

    • works amazingly well (so does C++ & US Government)

  • Unix semantics:

    • well defined, must propagate writes

    • not suitable with modest bandwidth network

    • suitable for SAN file systems

  • Networksemantics:

    • optimal for lower bandwidth situations, scales well

    • fails with heavy write/write sharing

InterMezzo, PJ Braam, CMU


Our inspiration coda features
Our inspiration: Coda Features

  • disconnected operation

  • server replication

  • reintegration, resolution

  • bandwidth adaptation

  • good security model

  • write back caching

InterMezzo, PJ Braam, CMU


Performance
Performance

  • Synchronous = BAD

    • rpc’s take long

    • context switch to cache manager takes long

    • disk writes take long

  • InterMezzo

    • exploits good disk file systems

    • normal case: speed of local disk file system

    • gives kernel autonomy

    • does write back caching at kernel level

InterMezzo, PJ Braam, CMU


Intermezzo
InterMezzo

InterMezzo, PJ Braam, CMU


Intermezzo strategy
InterMezzo Strategy

  • Protocol

    • Retain much of Coda’s protocols and semantics

  • Performance & scalability:

    • leverage disk file systems for cache: filter driver

    • more kernel autonomy: kernel write back cache

  • Implementation:

    • make it SIMPLE

    • leverage existing code: TCP, diskfs, rsync

    • avoid threads: use async I/O with completions

InterMezzo, PJ Braam, CMU


Intermezzo overview
InterMezzo overview

Application

Lento (PERL):

Cache Manager &

Server

Update propagation

& fetching with

InterMezzo server

Syscall

User level

Kernel Level

Upcalls

mkdir...

create...

rmdir...

unlink...

link….

no

VFS

Filter: data fresh?

Presto

Local file system

Kernel Update Journal

Kernel modification

log

InterMezzo, PJ Braam, CMU


Example of kernel code
Example of kernel code

presto_file_open(struct dentry *de)

{

if ( IAMLENTO ) {

bottom_fops->open(de);

mark_dentry(de, HAVE_DATA);

return;

}

if ( !check_dentry(de, HAVE_DATA) {

lento_open_file(de);

}

rc = bottom->open(de);

if ( ! IAMLENTO )

journal(“open”, de->d_name);

return rc;

}

Cache

mgmt

Access

filter

Upcall

Write

back

caching

InterMezzo, PJ Braam, CMU


Overview of functionality
Overview of functionality

  • Keep folder collections replicas in sync

  • Disconnected operation & reintegration

InterMezzo, PJ Braam, CMU


Server

2. Reintegrate

mkdir...

create...

rmdir…

store...

3. Forward

mkdir...

create...

rmdir…

store...

Client 1

Client 2

Client 3

1. Modify folder

collection

4. Replicators synchronized

InterMezzo, PJ Braam, CMU


Client 1

1. Retain journals

for disconnected

replicators

store...

create...

rmdir…

store...

1. Journal disconnected modifications

store…

store…

create...

Server

  • 2. Reconnect

    • a.Server forwards modification journals

    • b. Handle conflicts

    • c. Reintegrate client journals

  • 3. Client and server synchronized

InterMezzo, PJ Braam, CMU


File service protocol
File Service Protocol

InterMezzo, PJ Braam, CMU


Client server protocol
Client Server Protocol

  • File Service:

    • FetchDir

    • FetchFile

  • Modification Service:

    • Reintegrate

  • Consistency:

    • GetPermit/BreakPermit

    • Validate/BreakCallback

InterMezzo, PJ Braam, CMU


Client server symmetry
Client/Server Symmetry

  • Typical use:

    • client A fetches files

    • serverfetches modified files from client A

    • client A sends modification log to server

    • serversends modification log to replicators

  • Code reuse: both need

    • Modification Log & Fileservice

    • Policy different on client and server

InterMezzo, PJ Braam, CMU


Intermezzo implementation
InterMezzo implementation

  • Coda & XFS experience

    • threading is complicated

    • distributed state & locking: hard to track

    • don’t implement your own cache

    • don’t accumulate 500,000 lines of C

  • Lento learn from other efforts:

    • Ericson, Teapot, XFS, ACE: async request processing

    • Completion routines & state machine

    • Verify protocol correctness with Murphy

    • High level language or framework

InterMezzo, PJ Braam, CMU


Blocking operations
Blocking operations

  • Disk & network I/O

  • Proactive Reactor:

    • start asynchronous operation

    • give continuation & context to reactor

    • reactor activates completion routine

  • Advantages:

    • avoid threading, locking

    • very concise code describing protocols

    • state localized

InterMezzo, PJ Braam, CMU


Perl for our prototype
PERL for our prototype

InterMezzo, PJ Braam, CMU


State machine approach
State Machine Approach

  • Introduce POE: Perl Object Environment

    • can dynamically create sessions

    • hand blocking operations to the POE kernel

    • sessions have:

      • parents

      • state on a heap (or inline, in object or class)

    • sessions do:

      • post events to other sessions

      • handle events posted to them

InterMezzo, PJ Braam, CMU


Example session fetchfile
Example session: fetchfile

Fetchfile = new session (

{ init => {

if (!have_attr) post(conn, fetch_attr, have_attr);

else post(conn, fetch_data, complete);

},

have_attr => {

if (status == success)

post(conn, fetch_data, complete)

else { destruct_session(error); }

},

new_filefetch => {

queue_event(this) ;

},

complete => {

reply_to_caller; handle_queue; destruct_session;

}, …...

} );

InterMezzo, PJ Braam, CMU


Wheels drivers filters
Wheels, drivers, filters

  • Wheels are modify sessions

    • exploit asynchronous drivers e.g.:

      • read/write

      • socketfactory (accept clients)

    • filters: deliver “whole” packets e.g.:

      • full request or data packets

      • unpacked kernel requests

    • when I/O completes:

      • post to static sessions … or …

      • create dynamic session as wheel output

InterMezzo, PJ Braam, CMU


Our wheels
Our wheels...

  • Wheels:

    • Upcall: kernel requests (unpack filter)

    • Packets: network rpc/data traffic (xdr filter)

    • SocketFactory: to accept new conns

  • Instantiate request handlers:

    • net requests

    • kernel upcall requests

InterMezzo, PJ Braam, CMU


Intermezzo wheels
InterMezzo Wheels

Timers

Upcall/Netreq Sessions

Kernel

Static

Dynamic

sessions

ReqDispatcher Session

packets

upcalls

connects

Sockets

Presto

SocketFactory

…………….Wheels …………..

InterMezzo, PJ Braam, CMU


Net request processing
Net Request Processing

request sessions

reply

data

endreq

enddata

req

got_error

reqdispatcher

got_error

req

acceptor(port)

- list of client sessions

- peer, port, etc.

_start

Connection

got_wheel

PacketWheel

SocketFactory

InterMezzo, PJ Braam, CMU


got_upcall

ReqDispatcher

UpcallWheel

_start

upcall sessions --> resolve paths to volumes & servers

new

req

reply

data

endreq

enddata

get_connection

got_connection

got_error

Server object:

- connector session

- volumes hosted there

UpcallProcessing

_start

connector(host, port)

- list of client sessions

- peer, port, etc.

_start

Connection

got_wheel

got_error

SocketFactory

PacketWheel

InterMezzo, PJ Braam, CMU


Project
Project

See: www.inter-mezzo.org

InterMezzo, PJ Braam, CMU


What we have done
What we have done

  • So far mostly Linux

  • 2,500 lines of C: Linux kernel code

  • 3,800 lines of Perl

  • went through 4 total rewrites!

  • Connected & disconnected: solid

  • Reintegration: mostly working

  • Usable, not many features yet

InterMezzo, PJ Braam, CMU


Principal targets
Principal targets

  • Focus on replication not general caching

    • scalable server replication

    • laptop/desk home directory synchronization

  • Clusters

    • install & administer one machine

    • use InterMezzo to manage all of them

InterMezzo, PJ Braam, CMU


Forthcoming features
Forthcoming features

  • Security

  • Conflict handling

  • Better admin tools

  • Cache manager in C

  • Variants with different semantics (locking, write sharing)

  • Windows clients (?)

InterMezzo, PJ Braam, CMU


ad