1 / 67

Distributed File Systems

Distributed File Systems. Synchronization – 11.5 Consistency and Replication - 11.6 Fault Tolerance – 11.7. 11.5: Synchronization. File System Semantics File Locking. Synchronization. Is an issue only if files are shared

farica
Download Presentation

Distributed File Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed File Systems Synchronization – 11.5 Consistency and Replication - 11.6 Fault Tolerance – 11.7

  2. 11.5: Synchronization • File System Semantics • File Locking

  3. Synchronization • Is an issue only if files are shared • Sharing in a distributed system is often necessary, and at the same time can affect performance in various ways. • In the following discussion we assume file sharing takes place in the absence of process-implemented synchronization operations such as mutual exclusion.

  4. UNIX File Semantics • In a single-processor system, any file read operation returns the result of the most recent write operation. • Even if two writes occur very close together, the next read returns the result of the last write. • It is as if all reads and writes are time-stamped from the same clock. Operation order is based on strict time ordering.

  5. UNIX Semantics in DFS • Possible to (almost) achieve IF… • There is only one server • There is no caching at the client • In this case every read and write goes directly to the server, which processes them in sequential order. • Network delays might make minor differences in wall clock ordering.

  6. Caching and UNIX Semantics • Single-server + no client caching leads to poor performance, so most file systems allow users to make local copies of files (or file blocks) that are currently in use. • Now UNIX semantics are problematic: a write executed on a local copy only will not be seen by another client that reads the file from the server, or from other clients that have the file cached.

  7. Write-Through • A possible solution is to require all changes to local copies to be immediately written to the server. • Inefficient – caching is no longer as useful • Not a total solution: what happens when two users have the same file cached?

  8. Consistency Models • Recall discussion of consistency models in Chapter 7 • Realistically, strict consistency or even sequential consistency can’t be easily achieved without synchronization techniques such as transactions or locks • Here we consider what the file system can do in the absence of user-enabled methods.

  9. Session Semantics • Instead of trying to implement UNIX semantics where it really is impractical, define a new semantic: • Local changes to a file are not made permanent until the file is closed. If another user opens the file, it gets the original version. • This approach is common in DFS’s. • In effect, this turns a remote-access model into an upload-download model.

  10. Simultaneous Caching • What if two users concurrently cache and modify the same file? How do we determine the “new” state of the file? • Possibilities: • The most recently closed file becomes the new “official” version (most common) • The decision is unspecified (an unlikely choice)

  11. Immutable Files • The only operations on a file are, effectively, create,read, and replace. • Once a file is created it can be read but not changed. • A new file (incorporating changes to a current file) can be created and placed in the directory instead of the original version. • If several users try to replace an existing file at the same time, one is chosen: either the last to close, or non-deterministically.

  12. Review: File System Semantics • UNIX semantics • Session semantics • Immutable files • Every file operation is instantly visible to all processes • No changes are visible until the file is closed. • No updates are possible; files can only be replaced

  13. Transaction Semantics • Transactions are a way of grouping several file operations together and ensuring that they are either all executed or none is executed. • We say they are atomic. • The transaction system is responsible for ensuring that all of the operations are carried out in order, without any interference from concurrent transactions.

  14. The Transaction Model • Transaction: a set of operations which must be executed entirely, or not at all. • Processes in a transaction can fail at random • Failure causes: hardware or software problems, network problems, lost messages, etc. • Transactions will either commit or abort: • Commit => successful completion (All) • Abort => partial results are undone (Nothing)

  15. Transaction Model • Transactions are delimited by two special primitives: Begin_transaction // or something similar transaction operations (read, write, open, close, etc.) End_transaction • If the transaction successfully reaches the end statement, it “commits” and all changes become permanent; otherwise it aborts.

  16. ACID Properties of Transactions • Atomic: either all or none of the operations in a transaction are performed • Consistent: the transaction doesn’t affect system invariants; e.g., no money “lost” in a banking system • Isolated (serializable): one transaction can’t affect others until it completes • Durability: changes made by a committed transaction are permanent, even if the process or server fails.

  17. Atomicity • An atomic action is one that appears to be “indivisible and instantaneous” to the rest of the system. For example, machine language instructions. • Transactions support the execution of multiple instructions as if they were a single atomic instruction.

  18. Consistent • A state is consistent if invariants hold • An invariant is a predicate which states a condition that must be true. • Invariants for the airline ticket example: • seatsLeft = seatsTotal – seatsSold • seatsLeft >=0 • In the bank case (simplified) • balancefinal = balanceoriginal – withdrawals + deposits

  19. Isolated • No other transaction will see the intermediate results of a transaction. • Concurrent transactions have the same effect on the database as if they had run serially. Notice the similarity to critical sections, which do run serially. • This characteristic is enforced through special concurrency control measures.

  20. AD Properties • ACID is a commonly used term, but somewhat redundant. • Transactions that execute atomically will be consistent and isolated. • Atomicity and durability capture the essential qualities.

  21. UNIX semantics Session semantics Immutable files Transactions Every file operation is instantly visible to all processes No changes are visible until the file is closed. No updates are possible; files can only be replaced All changes occur and are visible atomically – or not at all Semantics of File Sharing in Distributed Systems

  22. File Locking • UNIX file semantics are not possible in DFS • Session semantics and immutable files do not always support the kind of sharing processes need. • Transactions have a heavy overhead. • Thus some additional form of locking is desirable to enforce mutual exclusion on writes.

  23. File Locking in NFSv4 • Lock managers in NSF, as in other file systems, are based on the centralized scheme discussed in Chapter 6 • Client requests lock • Lock manager grants lock • Client releases lock (or it expires after a time) • In NSF, if a client requests a lock which cannot be granted, the client is not blocked – must try again later.

  24. Denied Requests • If a client’s request for a lock is denied, it receives an error message. • Poll the server later for lock availability • Clients can request to be put on a FIFO queue; when a lock is released it is reserved for the first process on the queue; if that process polls within a certain amount of time it gets the lock.

  25. File locking in NFS • Two types of locks: • Reader locks, which can be held simultaneously, • Writer locks, which guarantee exclusive access. • The lock operation is applied to consecutive byte sequences in the file, rather than to the whole file.

  26. Operation Lock Lockt Locku Renew Description Create a lock for a range of bytes Test whether a conflicting lock has been granted Remove a lock from a range of bytes Renew the lease on a lock NFSv4 Lock Related Operations

  27. Leases • Locks are granted for a specific time interval. • At the end of that interval the lock is removed unless the client has requested an extension.

  28. Share Reservations in NFS • An open request specifies the kind of access the application requires: READ, WRITE, BOTH • It also specifies the kind of access that should be denied other clients: NONE, READ, WRITE, BOTH • If requirements can’t be met, open fails • Share reservations = implicit locking

  29. Share Reservations - Example • Client tries to open a file for reading and writing, and deny concurrent write access. • If no other client has the file open, the request succeeds. • If another client has opened the file for reading, the request succeeds • If another client has opened the file for writing, the request fails. • If another client has the file open and has denied read or write access, the request fails.

  30. 11.6: Consistency and Replication • Client-Side Caching • Server-Side Replication • Replication in P2P Systems

  31. Introduction • Replication (and caching) => multiple copies of something • Two reasons for replication: • Reliability (protection against failure, corruption) • Performance (size of user base, geographical extent of system) • Replication can cause inconsistency: at least one copy is different from the rest.

  32. Caching in a DFS • Caching in any DFS reduces access delays due to disk access times or network latency. • Caches can be located in the main memory of either the server or client and/or in the disk of the client • Client-side caching (memory or disk) offers most benefits, but also leads to potential inconsistencies.

  33. Cache Consistency Measures • Server-initiated consistency: server notifies client if its data becomes stale • e.g., another client closes its copy of the file, which was opened for writing. • Client-initiated consistency: client is responsible for consistency of data • e.g., client side software can periodically check with server to see if file has been modified.

  34. Caching in NFS • NFSv3 did not define a caching protocol. • Different implementations led to different results. • “Stale” data – data that doesn’t agree with the data at the server – could exist for periods ranging from a few seconds to ½ minute

  35. Cache Consistency Problem • How can stale data (relative to server) be avoided? • NFSv4 does not improve the system enormously, but there are some changes • Many details are still implementation dependent. • General structure – next slide

  36. Client Side Caching in NFS Figure 11-21. Memory Cache NFS server Client applica-tion Disk cache Network

  37. What Do Clients Cache? • File data blocks • File handles – for future reference • Directories

  38. Caching File Data • The simplest approach to caching allows the server to retain control over the file. • Procedure • Client opens file • Data blocks are transferred to the client (by read ops) • Client can read and write data in the cache. • When the file closes, flush changes back to server • Session semantics & NFS: the last (most recent) process to close a file has its changes become permanent. Changes made by processes that run concurrently are lost.

  39. Caching with Server Control • In caching with server control • All clients on a single machine may read and write the same cached data if they have access rights • data remaining in the cache after a file closes doesn’t need to be removed, altho changes must be sent to server. • If a new client on the same machine opens a file after it has been closed, the client cache manager usually must validate local cached data with the server • If the data is stale, replace it.

  40. Caching With Open Delegation • Allows a client machine to handle some local open and close operations from other clients on the same machine. • Normally the server decides if a client can open a file • Delegation can improve performance by limiting contact with the server • The client machine gets a copy of the entire file, not just certain blocks.

  41. Open delegation – Examples* • Suppose a client machine has opened a file for writing, and has been delegated rights to control the file locally. • If another local client tries to lock the file, the local machine can decide whether or not to grant the lock • If a remote client tries to lock the file (at the server) the server will deny file access • If a client has opened the file for reading, only, local clients desiring write privileges must still contact the server.

  42. Delegation and Callbacks • Server may need to “undelegate” the file – perhaps when another client needs to obtain access. • This can be done with a callback, which is essentially an RPC from server to client. • Callbacks require the server to maintain state (knowledge) about clients – a reason for NFS to be stateful.

  43. Caching Attributes* • Clients can cache attributes as well as data. • (size of file, number of links, last date modified, etc.) • Cached attributes are kept consistent by the client, if at all • No guarantee that the same file cached at two sites will have the same attributes at both sites • Attribute modifications should be written through to the server (write through cache coherence policy), although there’s no requirement to do so

  44. Leases* • Lease: cached data is automatically invalidated after a certain period of time. • Applies to file attributes, file handles (mapping of name to file handle), directories, and sometimes data. • When lease expires, must renew data from server • Helps with consistency.

  45. An Implementation of Leases* • Data blocks have time-stamps applied by the server that indicate when they were last modified. • When a block is cached at a client, the server’s time-stamp is also cached. • After a period of time, the client confirms the validity of the data • Compare timestamp at the client to timestamp at server • If server timestamp is more recent, invalidate client data

  46. CodaA Prototype Distributed File System • Developed at CMU – M. Satarayanan • Started in 1987 as an improvement on the Andrew file system ( a classic research FS) • Most recent version of Coda (6.9.3) was released 1/11/2008 (http://www.coda.cs.cmu.edu/news.html )

  47. Objectives of Coda • Support disconnected operation (server goes down, laptop is disconnected from network, etc.) • Client side caching is extensive • Uses client disk cache • Replication contributes to availability, fault tolerance, scalability

  48. Caching in Coda • Critical, because of Coda’s objectives • Caching achieves scalability; provides more fault tolerance for the client in case it is disconnected from the server. • When a client opens a file, the entire file is downloaded. This is true for reads and writes.

  49. Concurrent Access • In Coda, many clients may have a file open for reading, but only one for writing. • Multiple readers and single writer may exist concurrently • In NFS and most other file systems, multiple readers and multiple writers can exist concurrently.

  50. Callbacks/Server Initiated Cache Consistency • A Coda callback is an agreement between the server and a client. Server agrees to notify client when a file has been modified by another client. • At this time, the client may purge the file from its cache, but it may also continue reading the outdated copy. • This is a blend of session and transaction semantics.

More Related