1 / 33

A new Distributed File System Peter J. Braam, braam@cs.cmu

www.inter-mezzo.org. A new Distributed File System Peter J. Braam, braam@cs.cmu.edu Carnegie Mellon University & Stelias Computing. Overview. Joint work with Michael Callahan & Phil Schwan Distributed file systems protocols, semantics, usage patterns InterMezzo

kaiyo
Download Presentation

A new Distributed File System Peter J. Braam, braam@cs.cmu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. www.inter-mezzo.org A new Distributed File System Peter J. Braam, braam@cs.cmu.edu Carnegie Mellon University & Stelias Computing InterMezzo, PJ Braam, CMU

  2. Overview • Joint work with • Michael Callahan & Phil Schwan • Distributed file systems • protocols, semantics, usage patterns • InterMezzo • purpose, design, implementation • Project plans InterMezzo, PJ Braam, CMU

  3. Distributed File Systems InterMezzo, PJ Braam, CMU

  4. Distributed File Systems • Purpose: make remote files behave as if local • Clients: receivers of files, suppliers of updates • Servers: suppliers of files, receivers of updates • Challenges: • Semantics and protocols of sharing • Performance • Implementation and correctness • Newer features • disconnection, reconnections, server replication, validation and conflict resolution InterMezzo, PJ Braam, CMU

  5. Semantics • Unix I/O model: • shared memory model • writes visible to readers immediately • last write wins • Network file systems • Weak semantics: “aging”, “timeout” (NFS, SMB) • Unix semantics: Sprite, DCE/DFS, XFS • New semantics: Coda/InterMezzo/AFS InterMezzo, PJ Braam, CMU

  6. Network Semantics • Propagate writes upon close: last close wins • Callbacks - guarantee currency • Client continues to use files until notified by server • No connected client ever sees stale data • Server maintains state • Permits/Tokens - guarantee exclusivity • Client propagates updates lazily until notified • Is major performance gain • Server maintains state • Validation after reconnecting:version stamps InterMezzo, PJ Braam, CMU

  7. Tradeoffs • No semantics: • works amazingly well (so does C++ & US Government) • Unix semantics: • well defined, must propagate writes • not suitable with modest bandwidth network • suitable for SAN file systems • Networksemantics: • optimal for lower bandwidth situations, scales well • fails with heavy write/write sharing InterMezzo, PJ Braam, CMU

  8. Our inspiration: Coda Features • disconnected operation • server replication • reintegration, resolution • bandwidth adaptation • good security model • write back caching InterMezzo, PJ Braam, CMU

  9. Performance • Synchronous = BAD • rpc’s take long • context switch to cache manager takes long • disk writes take long • InterMezzo • exploits good disk file systems • normal case: speed of local disk file system • gives kernel autonomy • does write back caching at kernel level InterMezzo, PJ Braam, CMU

  10. InterMezzo InterMezzo, PJ Braam, CMU

  11. InterMezzo Strategy • Protocol • Retain much of Coda’s protocols and semantics • Performance & scalability: • leverage disk file systems for cache: filter driver • more kernel autonomy: kernel write back cache • Implementation: • make it SIMPLE • leverage existing code: TCP, diskfs, rsync • avoid threads: use async I/O with completions InterMezzo, PJ Braam, CMU

  12. InterMezzo overview Application Lento (PERL): Cache Manager & Server Update propagation & fetching with InterMezzo server Syscall User level Kernel Level Upcalls mkdir... create... rmdir... unlink... link…. no VFS Filter: data fresh? Presto Local file system Kernel Update Journal Kernel modification log InterMezzo, PJ Braam, CMU

  13. Example of kernel code presto_file_open(struct dentry *de) { if ( IAMLENTO ) { bottom_fops->open(de); mark_dentry(de, HAVE_DATA); return; } if ( !check_dentry(de, HAVE_DATA) { lento_open_file(de); } rc = bottom->open(de); if ( ! IAMLENTO ) journal(“open”, de->d_name); return rc; } Cache mgmt Access filter Upcall Write back caching InterMezzo, PJ Braam, CMU

  14. Overview of functionality • Keep folder collections replicas in sync • Disconnected operation & reintegration InterMezzo, PJ Braam, CMU

  15. Server 2. Reintegrate mkdir... create... rmdir… store... 3. Forward mkdir... create... rmdir… store... Client 1 Client 2 Client 3 1. Modify folder collection 4. Replicators synchronized InterMezzo, PJ Braam, CMU

  16. Client 1 1. Retain journals for disconnected replicators store... create... rmdir… store... 1. Journal disconnected modifications store… store… create... Server • 2. Reconnect • a.Server forwards modification journals • b. Handle conflicts • c. Reintegrate client journals • 3. Client and server synchronized InterMezzo, PJ Braam, CMU

  17. File Service Protocol InterMezzo, PJ Braam, CMU

  18. Client Server Protocol • File Service: • FetchDir • FetchFile • Modification Service: • Reintegrate • Consistency: • GetPermit/BreakPermit • Validate/BreakCallback InterMezzo, PJ Braam, CMU

  19. Client/Server Symmetry • Typical use: • client A fetches files • serverfetches modified files from client A • client A sends modification log to server • serversends modification log to replicators • Code reuse: both need • Modification Log & Fileservice • Policy different on client and server InterMezzo, PJ Braam, CMU

  20. InterMezzo implementation • Coda & XFS experience • threading is complicated • distributed state & locking: hard to track • don’t implement your own cache • don’t accumulate 500,000 lines of C • Lento learn from other efforts: • Ericson, Teapot, XFS, ACE: async request processing • Completion routines & state machine • Verify protocol correctness with Murphy • High level language or framework InterMezzo, PJ Braam, CMU

  21. Blocking operations • Disk & network I/O • Proactive Reactor: • start asynchronous operation • give continuation & context to reactor • reactor activates completion routine • Advantages: • avoid threading, locking • very concise code describing protocols • state localized InterMezzo, PJ Braam, CMU

  22. PERL for our prototype InterMezzo, PJ Braam, CMU

  23. State Machine Approach • Introduce POE: Perl Object Environment • can dynamically create sessions • hand blocking operations to the POE kernel • sessions have: • parents • state on a heap (or inline, in object or class) • sessions do: • post events to other sessions • handle events posted to them InterMezzo, PJ Braam, CMU

  24. Example session: fetchfile Fetchfile = new session ( { init => { if (!have_attr) post(conn, fetch_attr, have_attr); else post(conn, fetch_data, complete); }, have_attr => { if (status == success) post(conn, fetch_data, complete) else { destruct_session(error); } }, new_filefetch => { queue_event(this) ; }, complete => { reply_to_caller; handle_queue; destruct_session; }, …... } ); InterMezzo, PJ Braam, CMU

  25. Wheels, drivers, filters • Wheels are modify sessions • exploit asynchronous drivers e.g.: • read/write • socketfactory (accept clients) • filters: deliver “whole” packets e.g.: • full request or data packets • unpacked kernel requests • when I/O completes: • post to static sessions … or … • create dynamic session as wheel output InterMezzo, PJ Braam, CMU

  26. Our wheels... • Wheels: • Upcall: kernel requests (unpack filter) • Packets: network rpc/data traffic (xdr filter) • SocketFactory: to accept new conns • Instantiate request handlers: • net requests • kernel upcall requests InterMezzo, PJ Braam, CMU

  27. InterMezzo Wheels Timers Upcall/Netreq Sessions Kernel Static Dynamic sessions ReqDispatcher Session packets upcalls connects Sockets Presto SocketFactory …………….Wheels ………….. InterMezzo, PJ Braam, CMU

  28. Net Request Processing request sessions reply data endreq enddata req got_error reqdispatcher got_error req acceptor(port) - list of client sessions - peer, port, etc. _start Connection got_wheel PacketWheel SocketFactory InterMezzo, PJ Braam, CMU

  29. got_upcall ReqDispatcher UpcallWheel _start upcall sessions --> resolve paths to volumes & servers new req reply data endreq enddata get_connection got_connection got_error Server object: - connector session - volumes hosted there UpcallProcessing _start connector(host, port) - list of client sessions - peer, port, etc. _start Connection got_wheel got_error SocketFactory PacketWheel InterMezzo, PJ Braam, CMU

  30. Project See: www.inter-mezzo.org InterMezzo, PJ Braam, CMU

  31. What we have done • So far mostly Linux • 2,500 lines of C: Linux kernel code • 3,800 lines of Perl • went through 4 total rewrites! • Connected & disconnected: solid • Reintegration: mostly working • Usable, not many features yet InterMezzo, PJ Braam, CMU

  32. Principal targets • Focus on replication not general caching • scalable server replication • laptop/desk home directory synchronization • Clusters • install & administer one machine • use InterMezzo to manage all of them InterMezzo, PJ Braam, CMU

  33. Forthcoming features • Security • Conflict handling • Better admin tools • Cache manager in C • Variants with different semantics (locking, write sharing) • Windows clients (?) • … InterMezzo, PJ Braam, CMU

More Related