1 / 56

Caching in Distributed File System

Caching in Distributed File System. Ke Wang CS614 – Advanced System Apr 24, 2001. Key requirements of distributed system. Scalability from small to large networks Fast and transparent access to geographically Distributed File System(DFS)

seven
Download Presentation

Caching in Distributed File System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Caching in Distributed File System Ke Wang CS614 – Advanced System Apr 24, 2001

  2. Key requirements of distributed system • Scalability from small to large networks • Fast and transparent access to geographically Distributed File System(DFS) • Information protection • Ease of administration • Wide support from variety of vendors

  3. Background • DFS -- a distributed implementation of a file system, where multiple users share files and storage resources. • Overall storage space managed by a DFS is composed of different, remotely located, smaller storage spaces • There is usually a correspondence between constituent storage spaces and sets of files

  4. DFS Structure • Service - a software entity providing a particular type of function to client • Server - service software running on a single machine • Client - process that can invoke a service using a set of operations that forms its client interface

  5. Why caching? • Retaining most recently accessed disk blocks. • Repeated accesses to a block in cache can be handled without involving the disk. • Advantages - Reduce delays - Reduce contention for disk arm

  6. Caching in DFS • Advantages • Reduce network traffic • Reduce server contention • Problems • Cache-consistency

  7. Stuff to consider • Cache location (disk vs. memory) • Cache Placement (client vs. server) • Cache structure (block vs. file) • Stateful vs. Stateless server • Cache update policies • Consistency • Client-driven vs. Server-driven protocols

  8. Practical Distributed System • NFS: Sun’s Network File System • AFS: Andrew File System (CMU) • Sprite FS: File System for the Sprite OS ( UC Berkeley)

  9. Sun’s Network File System(NFS)

  10. Sun’s Network File System(NFS) • Originally released in 1985 • Build on top of an unreliable datagram protocol UDP (change to TCP now) • Client-server model

  11. Andrew File System(AFS) • Developed at CMU since 1983 • Client-server model • Key software: Vice and Venus • Goal: high scalability (5,000-10,000 nodes)

  12. Andrew File System(AFS)

  13. Andrew File System(AFS) • VICE is a multi-threaded server process with each thread handling a single client request • VENUS is the client process that runs on each workstation which forms the interface with VICE • User-level processes

  14. Prototype of AFS • One process for one client • Client cache file • Verify timestamp every open • -> a lot of interaction with server • -> heavy network traffic

  15. Improve AFS • To improve prototype • Reduce cache validity check • Reduce server processes • Reduce network traffic •  Higher scalability!

  16. Sprite File System • Designed for networked workstation with large physical memories (can be diskless) • Expect memory of 100-500Mbytes • Goal: high performance

  17. Caches in Sprite FS

  18. Caches in Sprite FS(cont) • When a process makes a file access, it is presented first to the cache(file traffic). If not satisfied, request is passed either to a local disk, if the file is stored locally(disk traffic), or to the server where the file is stored(servertraffic). Servers also maintain caches to reduce disk traffic.

  19. Caching in Sprite FS • Two unusual aspects • Guarantee complete consistent view • Concurrent write sharing • Sequential write sharing • Cache size varies dynamically

  20. Cache LocationDisk vs. Main Memory • Advantages of disk caches • More Reliable • Cached data are still there during recovery and don’t need to be fetched again

  21. Cache LocationDisk vs. Main Memory(cont) • Advantages of main-memory caches: • Permit workstations to be diskless • More quick access • Server caches(used to speed up disk I/O) are always in main memory; using main-memory caches on the clients permits a single caching mechanism for servers and users

  22. Cache PlacementClient vs. Server • Client cache reduce network traffic • Read-only operations on unchanged files do not need go over the network • Server cache reduce server load • Cache is amortized across all clients ( but needs to be bigger to be effective) • In practice, need BOTH!

  23. Cache structure • Block basis • Simple • Sprite FS, NFS • File basis • Reduce interaction with servers • AFS • Cannot access files larger than cache

  24. Compare • NFS: client memory(disk), block basis • AFS: client disk, file basis • Sprint FS: client memory, server memory, block basis

  25. Stateful vs. Stateless Server • Stateful– Servers hold information about the client • Stateless – Servers maintain no state information about clients

  26. Stateful Servers • Mechanism • Client opens a file • Server fetches information about the file from its disk, store in memory, gives client a unique connection id and open file • id is used for subsequent accesses until the session ends

  27. Stateful Servers(cont) • Advantages: • Fewer disk access • Read-ahead possible • RPCs are small, contains only an id • File may be cached entirely on client, invalidated by the server if there is a conflicting write

  28. Stateful Servers(cont) • Disadvantage: • Server loses all its volatile state in crash • Restore state by dialog with clients, or abort operations that underway when crash occurred • Server needs to be aware of client failures

  29. Stateless Server • Each request must be self-contained • Each request identifies the file and position in the file • No need to establish and terminate a connection by open and close operations

  30. Stateless Server(cont) • Advantage • A file server crash does not affect clients • Simple • Disadvantage • Impossible to enforce consistency • RPC needs to contain all state, longer

  31. Stateful vs. Stateless • AFS and Sprite FS are stateful • Sprite FS servers keep track of which clients have which files open • AFS servers keep track of the contents of client’s caches • NFS is stateless

  32. Cache Update Policy • Write-through • Delayed-write • Write-on-close (variation of delayed-write)

  33. Cache Update Policy(cont) • Write-through – all writes be propagated to stable storage immediately • Reliable, but poor performance

  34. Cache Update Policy(cont) • Delayed-write – modification written to cache and then written through to server later • Write-on-close – modification written back to server when file close • Reduces intermediate read and write traffic while file is open

  35. Cache Update Policy(cont) • Pros for delayed-write/write-on-close • Lots of files have lifetimes of less than 30s • Redundant writes are absorbed • Lots of small writes can be batched into larger writes • Disadvantage: • Poor reliability; unwritten data may be lost when client crash

  36. Caching in AFS • Key to Andrew’s scalability • Client cache entire file in disk • Write-on-close • Server load and network traffic reduced • Contacts server only on open and close • Retain across reboots • Require local disk, large enough

  37. Cache update policy • NFS and Sprite delayed-write • Delay 30 seconds • AFS write-on-close • Reduce traffic to server dramatically •  Good scalability of AFS

  38. Consistency • Is locally cached copy of data consistent with the master copy? • Is there danger of “stale” data? • Permit concurrent write sharing?

  39. Sprite:Complete Consistency • Concurrent Write Share • A file open on multiple clients • At least one client write • Server detects • Require write back to server • Invalidate open cache

  40. Sprite:Complete Consistency • Sequential Write Sharing • A file modified, closed, opened by others • Out-of-date blocks • Compare version number with server • Current data in other’s cache • Keep track of last writer

  41. AFS: session semantics • Session semantics in AFS • Writes to an open file invisible to others • Once file closed, changes visible to new opens anywhere • Other file operations visible immediately • Only guarantee sequential consistency

  42. Consistency • Sprite guarantees complete consistency • AFS uses session semantics • NFS not guarantee consistency • NFS is stateless. All operations involve contacting the server; if server is unreachable, read & write cannot work

  43. Client-driven vs. Server-driven • Client-driven approach • Client initiates validity check • Server check whether the local data are consistent with master copy • Server-driven approach • Server records files client caches • When server detect inconsistency, it must react

  44. AFS: server-driven • Callback (key to scalability) • Cache valid if have callback on • Server notify before modification • When reboot, all suspect • reduces cache validation requests to server

  45. Client-driven vs. Server-driven • AFS is server-driven (callback) • Contributes to AFS’s scalability • Whole file caching and session semantics also help • NFS and Sprite are client-driven • Increased load on network and server

  46. AFS:Effect on scalability

  47. Sprite:Dynamic cache size • Make client cache as large as possible • Virtual memory and file system negotiate • Compare age of oldest page • Two problems • Double caching • Multiblock pages

  48. Why not callback in Sprite?

  49. Why not callback in Sprite? • Estimated improvement is small • Reason • Andrew is user-level process • Sprite is kernel-level implementation

  50. Comparison

More Related