1 / 34

Distributed File Systems

Distributed File Systems. DFS. A distributed file system is a module that implements a common file system shared by all nodes in a distributed system DFS should offer network transparency high availability key DFS services file server (store, and read/write files)

cameo
Download Presentation

Distributed File Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed File Systems

  2. DFS • A distributed file system is a module that • implements a common file system shared by all nodes in a distributed system • DFS should offer • network transparency • high availability • key DFS services • file server (store, and read/write files) • name server (map names to stored objects) • cache manager (file caching at clients or servers)

  3. DFS Mechanisms • Mounting • Caching • Hints • Bulk data transfers

  4. DFS mechanisms • mounting • name space = collection of names of stored objects which may or may not share a common name resolution mechanism • binds a name space to a name (mount point) in another name space • mount tables maintaining the map of mount points to stored objects • mount tables can be kept at clients or servers • caching • amortize access cost of remote or disk data over many references • can be done at clients and/or servers • can be main memory or disk caches • helps to reduce delays (disk or network) in accessing stored objects • helps to reduce server loads and network traffic

  5. DFS mechanisms • hints • caching introduces the problem a cache consistency • ensuring cache consistency is expensive • cached info can be used as a hint (e.g. mapping of a name to a stored object) • bulk data transfers • overhead in executing network protocols is high • network transit delays are small • solution: amortize protocol processing overhead and disk seek times and latencies over many file blocks

  6. Name Resolution Issues • naming schemes • host:filename • simple and efficient • no location-transparency • mounting • single global name space • uniqueness of names requires cooperating servers • Context-aware • partition the name space into contexts • name resolution is always performed with respect to a given context • name servers • single name server • different name servers for different parts of a name space

  7. Caching Issues • Main memory caches • faster access • diskless clients can also use caching • single design for both client and server caches • compete with Virtual Memory manager for physical memory • can not completely cache large stored objects • block-level caching is complex to implement • can not be used by portable clients • Disk caches • remove some of drawbacks of the main memory caches

  8. Caching Issues • Writing policy • write-through • every client’s write request is performed at the servers immediately • delayed writing • client’s writes are reflected to the stored objects at servers after some delay • many writes in the cache • writes to short-lived objects are not done at servers • 20-30% of new data are deleted within 30 secs • lost data is an issue • delayed writing until file close • most files are open for a short time

  9. Caching Issues • Approaches to deal with the cache consistency problem • server-initiated • servers inform client cache managers whenever their cached data become stale • servers need to keep track who cached which file blocks • client-initiated • clients validate data with servers before using • partially negates caching benefits • disable caching when concurrent-write sharing is detected • concurrent-write sharing: multiple clients opened a file with at least one of them opened for writing • avoid concurrent-write sharing by using locking

  10. More Caching Consistency Issues • The sequential-write sharing problem • occurs when a client opens a (previously opened) file that has recently been modified and closed by another client • causes problems • A client may still have (outdated) file blocks in its cache • Other client may have not written its modified cached file blocks to file server • solutions • associate file timestamps with all cached file blocks; at file open request current file timestamp from file server • file server asks the client with the modified cached blocks to flush its data to server when another client opens a file for writing

  11. Availability Issues • Replication can help in increasing data availability • is expensive due to extra storage for replicas and due to overhead in maintaining the replicas consistent • Main problems • maintaining replica consistency • detecting replica inconsistencies and recovering from them • handle network partitions • placing replicas where needed • keep the rate of deadlocks small and availability high

  12. Availability Issues • Unit of replication • complete file or file block • allows replication of only the data that are needed • replica management is harder (locating replicas, ensuring file protection, etc) • volume (group) of files • wasteful if many files are not needed • replica management simpler • pack, a subset of the files in a user’s primary pack • mutual consistency among replicas • Let most current replica= replica with highest timestamp in a quorum • Use voting to read/write replicas and keep at least one replica current • Only votes from most current replicas are valid

  13. Scalability & Semantic Issues • Caching & cache consistency • take advantage of file usage patterns • many widely used and shared files are accessed in read-only mode • data a client needs are often found in another client’s cache • organize client caches and file servers in a hierarchy for each file • implement file servers, name servers, and cache managers as multithreaded processes • common FS semantics: each read operation returns data due to the most recent write operation • providing these semantics in DFS is difficult and expensive

  14. NFS

  15. NFS • Interfaces • file system • virtual file system (VFS) • vnodes uniquely identify objects in the FS • contain mount table info (pointers to parent FS and mounted FS) • RPC and XDR (external data representation)

  16. NFS Naming and Location • Filenames are mapped to represented object at first use • mapping is done at the servers by sequentially resolving each element of a pathname using the vnode information until a file handle is obtained

  17. NFS Caching • File Caching • read ahead and 8KB file blocks are used • files or file blocks are cached with timestamp of last update • cached blocks are assumed valid for a preset time period • block validation is performed at file open and after timeout at the server • upon detecting an invalid block all blocks of the file are discarded • delayed writing policy with modified blocks flushed to server upon file close

  18. NFS Caching • Directory name lookup caching • directory names ==> vnodes • cached entries are updated upon lookup failure or when new info is received • File/Directory attribute cache • access to file/dir attributes accounts for 90% of file requests • file attributes are discarded after 3 secs • dir. Attributes are discarded after 30 secs • dir. Changes are performed at the server • NFS servers are stateless

  19. Sprite File System • Name space is a single hierarchy of domains • Each server stores one or more domains • Domains have unique prefixes • mount points link domains in single hierarchy • clients maintain prefix table

  20. Sprite FS - Prefix tables • locating files in Sprite • each client finds longest prefix match in its prefix table and then sends remaining of pathname to the matching server together with the domain token in its prefix table • server replies with file token or with a new pathname if the “file” is a remote link • each client request contains the filename and domain token • when client fails to find matching prefix or fails during a file open • client broadcasts pathname and server with matching domain replies with domain/file token • entries in prefix table are hints

  21. Sprite FS - Caching • Client-cache in main memory • file block size is 4KB • cache entries are addressed with file token and block#, which allows • blocks to be added without contacting the server • blocks can be accessed without accessing file’s disk map to get block’s disk address • clients do not cache directories to avoid inconsistencies • servers have main memory caches as well • delayed writing policy is used

  22. Sprite FS - Cache Writing Policy • Observations • BSD • 20-30% of new data live less than 30 secs • 75% of files are open for less than 0.5 secs • 90% of files are open for less than 10 secs • recent study • 65-80% of files are open for less than 30 secs • 4-27% of new data are deleted within 30 secs • One can reduce traffic by • not updating servers at file close immediately • not updating servers when caches are updated

  23. Sprite Cache Writing Policy • Delayed writing policy • every 5 secs flush client’s cached (modified) blocks to server if they haven’t been modified within the last 30 secs • flush blocks from server’s cache to disk within 30-60 secs afterwards • replacement policy: LRU • 80% of time blocks ejected to make room for other blocks • 20% of time to return memory to VM • cache blocks are unreferenced for about 1hr before ejected • cache misses • 40% on reads and 1% on writes

  24. Sprite Cache Consistency • Server initiated • avoid concurrent-write sharing by disabling caching for files open concurrently for reading and writing • ask client writing file to flush its blocks • inform all other clients that file is not cacheable • file becomes cacheable when all clients close the file again • solve sequential-write sharing using version numbers • each client keeps the version# of file whose blocks it caches • server increments version# each time file is opened for writing • client is informed of file version# at file open • server keeps track of last writer; server asks last writer to flush its cached blocked if file is opened by another client

  25. Sprite VM and FS Cache Contention • VM and FS compete for physical memory • VM and FS negotiate for physical memory usage • separate pools of blocks using the time of last access to determine winner; VM is given slight preference (it losses only if a block hasn’t been referenced for 20 mins) • double caching is a problem • FS marks blocks of newly compiled code with infinite time of last reference • backing files=swapped-out pages (including process state and data segments) • clients bypass FS cache when reading/writing backing files

  26. CODA • Goals • scalability • availability • disconnected operation • Volume = collection of files and directories on a single server • unit of replication • FS objects have a unique FID which consists of • 32-bit volume number • 32-bit vnode number • 32-bit uniquifier • replicas of a FS object have the same FID

  27. CODA Location • Volume Location database • replicated at each server • Volume Replication database • replicated at each server • Volume Storage Group (VSG) • Venus • client cache manager • caching in local disk • AVSG=client accessible nodes in VSG • preferred server in AVSG

  28. CODA Caching & Replication • Venus caches files/dirs on demand • from the server in AVSG with the most up-to-date data • on file access • users can indicated caching priorities for file/dirs • users can bracket action sequences • Venus established callbacks at preferred server for each FS object • Server callbacks • server tells client that cached object is invalid • lost callbacks can happen

  29. CODA AVSG Maintenance • Venus tracks changes in AVSG • new nodes in VSG that should or should not be in its AVSG by periodically probing every node in VSG • removes a node from AVSG if operation fails • chooses a new preferred server if needed • Coda Version Vector (CVV) • both for volumes and files/dirs • vector with one entry for each node in VSG indicating the number of updates of the volume or FS object

  30. Coda Replica Management • State of an object or replica • each modification is tagged with a storeid • update history = sequence of storeids • state is a truncated update history • latest storeid LSID • CVV

  31. Coda Replica Management • comparing replicas A & B leads to one of four cases • LSID-A = LSID-B & CVV-A = CVV-B => strong equality • LSID-A = LSID-B & CVV-A != CVV-B => weak equality • LSID-A != LSID-B & CVV-A >= CVV-B => A dominates B • otherwise => inconsistent • when S receives an update for a replica C • checks the state of S and C; test is successful if • for files, it leads to strong equality or dominance • for dirs, it leads to strong equality

  32. Coda Replica Management • When C wants to update a replicated object • phase I • sent update to every node in its AVSG • each node performs a check of replica states (cached object and replicated object), and informs the client of the result, and performs the update if successful • if unsuccessfull, pauses client, server tries to resolve problem automatically, if not then client aborts else client resumes • phase II • client sends updated object state to every site in AVSG

  33. Coda Replica Management • Force operation between servers • happens when Venus informs AVSG of weak consistency in AVSG • server with dominant replica overwrites data and state of dominated server • for directories is done with the help of locking one directory at a time • repair operation • automatic; proceeds in two phases as in an update • migrate operation • moves inconsitent data to a covolume for manual repair

  34. Conflict Resolution • Conflicts between • files are done by the user using the repair tool which bypasses Coda update rules; inconsistent files are inaccessible to CODA • directories • uses the fact that a dir is a list of files • non-automated conflicts • update/update (for attributes) • remove/update • create/create (adding identical files) • all other conflicts can be resolved easily • inconsistent objects and objects without automatic conflict resolution are placed in covolumes

More Related