Distributed File Systems

Distributed File Systems CSE5306 Lecture Quiz Due 23 June 2014 at 5 PM

Distributed File Systems • Distributed file systems enable multiple processes to share data over long periods of time in a secure and reliable way. • They are the basic layer for most distributed systems and applications.

Architecture • Distributed file systems are organized as… • Traditional client-server architecturesor • Fully decentralized architectures.

Client-Server Architectures • Sun Microsystem’s Network File System (NFS) is the defacto standard for UNIX systems, and for the local file system with which it coexists. • NFS’ “remote file model” (above left) gives clients transparent access to remote servers’ files. • Like the Internet’s FTP, NFS’ “upload/download model” (above center) gives files to the client. • NFS replaces local UNIX file system interfaces with its own Virtual File System (above right). Its “NFS client” and “NFS server” mediate all remote file accesses via RPCs.

File System Model • Like UNIX’ file system model, NFS hierarchically organizes files into directories. • Files have names, but NFS knows them by their handles. • Every file also has many attributes; e.g., users’ access rights, length, last-modified date. • NFS’ has UNIX-like file operations (see above).

R U O K ? • Which of the following is NOT true of Sun Microsystem’s Network File System (NFS)? • NFS is the defacto standard for file system for all UNIX systems. • NFS extends the local file system by replacing its local interfaces with global ones. • NFS’ flat file system departs from UNIX’ hierarchical file system model. • NFS gives clients transparent access to remote servers’ files. • NFS uploads and downloads files like FTP.

Cluster-Based Distributed File Systems • Striping is writing consecutive file blocks in different servers (b above), to be read concurrently. • Google file system (GFS, above right): • Assumes at least one of the data center’s >20K servers is down at any time. • Each multi-gigabyte file is edited by appending. • A GFS master supervises 300 clustered “chunk servers.” • A chunk server contains 64MB file blocks. • Blocks are replicated to guard against chunk server crashes. • GFS client gives file name and block number to master, which returns contact addresses for all servers storing that block. • A client updates a file on the nearest server, which pushes it to others.

Symmetric Architectures • “Symmetric” organizations share files as peers: • They distribute files with DHT (p.188). • They use keys (p.???) to look them up. • Ivy builds its file system on top of a distributed storage layer (see above): • Ivy’s NFS-like file system is on top. • Series of write updates stored in user’s linked list log. • Read operation applies every write to the file block and returns the result. • Kosha does the same with whole files, found in node’s hash-code identified directories. • Dhash is a fully distributed block-oriented storage layer: • 8KB data blocks replicated in k immediate successor servers. • Content-hash block is the secure hash of the block’s content (CRC). • Public-key block is signed with author’s private key. • Chord provides basic DHT-based decentralized lookup facilities. Tanenbaum’s Fig. 11-6, p.499

Processes • Simple NFS processes were stateless—no need for crash recovery to previous state. • NFS v.4 has minimal states for… • locking files, • authenticating clients, • caching files that traverse WANs, • handling RPC callbacks.

Communication • Distributed file systems communicate via RPCs. • The RPC interface makes them independent from… • operating systems, • networks and • transport protocols.

R U O K ? 2. How does Google’s massive GFS file system tolerate inevitable failures? • A master server distributes Google’s multi-gigabyte files among hundreds of available chunk servers, which do all the work. • The master server retains only file metadata in a simple, single-level table in main memory. • Servers seldom fail, because of their light loading and tight environmental control. • File striping distributes redundant chunks of each file across many disk drives. • Both a and b above.

RPCs in NFS • NFS clients talk to file servers via the Open Network Computing (ONC) RPC protocol. • WAN communications were slow, when old NFSv3 two RPCs to look up and read a file (above left). But NFSv4’s compound procedures use only one RPC (above right). • Any failure stops the whole series of operations and returns an error message. • Write-write conflicts with other clients are not resolved.

The RPC2 Subsystem • Coda’s RPC2 client code forks new thread(s) to build a reliable RPC or many MultiRPCs upon an unreliable Universal Datagram Protocol. The RPC or MultiRPC blocks until its server answers. • Opening a server’s video file prompts Coda’s RPC2s at the client and server to call side-effect routines that open an isochronous connection and rapidly transfer data. • When a file is modified, Coda servers broadcast invalidation notices to all clients having copies.

File-Oriented Communication in Plan 9 • The Plan 9 file-based distributed system refers to networks and processes like UNIX files; e.g., TCP appears as six files above. • Commands look like file writes: • Ctl sends “connect 192.31.231.42!23” to open a telnet session in a distant server’s port 23. • Data “res=write(fd, buf, nbytes)” receives a packet. • Listen blocks till server responds to a read request.

Naming • NFS is representative of how all distributed file systems name their hierarchies of files and directories.

Naming in NFS • NFS allows a client to virtually import part of a server’s file system. • A file imported by one client may get a different directory path from the same file imported by another (above left), so all clients are encouraged to import files into identical local root subdirectories; e.g., /root/usr/bin/. • NFS does not allow clients to import files that a server imported from the originating server (above right). A server is allowed to export files from its own multiple file systems.

R U O K ? 3. Which of the following is true of NFS? • NFS enables a client to import files that its server imported from another server. • NFS only appears to import a client’s desired part of a server’s file system. • NFS is unique among distributed file systems, in its naming of file hierarchies and directories. • When one part of a long series of NFS file operations, the results of the operations prior to the failure are returned to the client. • NFS resolves file write-write conflicts with other clients.

File Handles • NFSv4’s server-unique 128-byte file “handles” are invisibleto clients. • They are “true identifiers” (p.181) of files inside the file system, which clients can use as opaque indices, without looking up the file’s given name and directory path repeatedly. • The initial search for a file’s handle begins at the root identified in NFSv4’s putrootfh command.

Automounting • A “mount point” is the directory path of a file (p.200). • Alice has a virtual directory called /home/ that has subdirectories called /Alice/ and /Bob/. When she logs in, her files are transparently imported (“automounted,” see upper left) from the NFS server, where they actually reside. When she reads one of Bob’s public files, it gets transparently automounted too. • But importing a file at a time is too much work. NFS’ client-based automounter imports Bob’s whole directory just once (upper right).

Constructing a Global Name Space • What if a grid (i.e., loosely federated cluster of many organizations’ clients) wants to merge its various file systems…? • The proposed Global Name Space (GPS) could join all of their directory trees with five (or more) types of “junctions,” which resemble traditional file system mount points. • For example, http//www.cs.vu.l/index.htm could be a physical file name (see last table entry above).

Synchronization • Synchronizing shared files can degrade a distributed file system’s performance. • Semantics and file locking issues must be considered. • Coda handles both of these well.

Semantics of File Sharing • A single processor’s “UNIX semantics” demands that a file read sees the results of the former write in “absolute time” (above center). • Distributed systems must deal with network delays, multiple file servers and their caches. • Even clients may cache copies of heavily used files. Writing to its cache may render another client’s cached copy obsolete (above left). • The widely used “session semantics” rule states that any client who opens a file may freely write to it, but others will not see those changes till the file is closed. • Three (poor, Ed.) alternatives also appear in the table (above right).

R U O K ? 4. True or False: An example of “UNIX semantics” is other clients do not see changes in a file until the editing client closes the file. • True. • False.

File Locking • NFSv4’s lock manager enables many writers to lock mutually exclusive segments of one file, and many readers to lock the same segment (above left). • Locks are granted in FIFO order upon request. • Locks are automatically released, when leases expire. • File access can be read, write or both (above right). • State variables remember access types of already open files, and they affect access grants to other Windows for the same file.

Sharing Files in Coda • NFS retains changes made by the last user to close a file—all other’s changes are lost. • Coda allows one user to open a file for writing, even if one or more users already have opened the file for reading (see above). • When the file write closes the file, it returns to the file server, and the readers continue to see an obsolete copy for the duration of their (transactional) sessions. Closing and opening the session refreshes the file.

Consistency and Replication • Client-side caching and file server replication raise consistency design issues in distributed file systems, especially those subject to network delays in wide-area networks.

Client-Side Caching • NFS, Coda and mobile devices handle client-side caching differently.

Caching in NFS • NFS allows a client to open and close files , as well as handle locks, for its several users (i.e., “open delegation,” above left). • The file server recalls those delegation rights (via RPC callback), when some other client requests access to a delegate’s file (above right). • Leases on delegated rights reduce consistency problems.

R U O K ? 5. Which of the following is NOT true in NFSv4’s file locking? • Many writers may lock mutually exclusive segments. • Many readersmay lock the same segment (above left). • Locks are granted in last-in, first-out order upon request. • Locks are automatically released, when leases expire. • The current state of an already open file affects other Windows’ access grants for the same file.

Client-Side Caching in Coda • By reducing dependence on file servers, Coda’s client-side caching… • Aids scalability. • Increases fault tolerance. • Coda keeps track of its clients’ file versions. • It promises to send an invalidation message (i.e., RPC callback break) to all other file users, when any one of them updates their common file. • A user who opens her cached file, or opens a session with the Coda file server also receives an invalidation message.

Client-Side Caching for Portable Devices • An explicit upload/download model maintains files on portable devices, which connect to the network intermittently. • Or the devices can fetch files from a local, immobile part of their distributed file system. • Storing a cryptographic hash of file content on a portable device enables it to quickly see if the file is locally available and up-to-date. • Files that will be needed while the device is offline must be pre-fetched.

Server-Side Replication • File servers are replicated for fault tolerance. • Clients’ performance improves when they cache files, which reduces their dependence on file servers and need of server replication. • Especially in low read/write ratio applications, replication reduces file servers’ performance: • Every update must be replicated at many servers. • Synchronizing concurrent updates is costly.

Server Replication in Coda • Coda stores one “volume” (i.e. a UNIX partition) at each file server. All servers with that volume are called a “Volume Storage Group” (VSG), and currently active VSG members are called a client’s “Accessible Volume Storage Group” (AVSG). • The Read-One-Write-All (ROWA, Fig. 7-22, p.313) protocol keeps Coda’s replicates consistent. • Clients read from any AVSG server but write updates to all of them (servers 1 & 2 for client A and server 3 for B above). • When the broken network is healed, the servers compare their files’ vector timestamps and become consistent in an application-dependent way, which may require user interactions.

Replication in Peer-to-Peer File Systems • Peer-to-peer files are… • Read-only • Updated by loading new files into the system. • Replication is important for… • Speeding up search and lookup requests. • Load balancing among nodes. • Peers can be… • Structured, which need search speed, or • Unstructured, which need load balancing.

R U O K ? 6. Which of the following is NOT true of client-side caching? • It reduces dependence on file servers. • It aids scalability. • It increases fault tolerance. • It sustains mobile devices while offline. • The clients’ replication makes them more fault tolerant.

Unstructured Peer-to-Peer Systems • To reduce search times, files can be replicated among unstructured peers in 3 ways: • According to their popularity in each network region. • Uniformly regardless of popularity. • Combine 1 & 2: each user finds an interesting file and makes it available to her community (social networking).

Structured Peer-to-Peer Systems • To balance work load among its structured peers, a structured node… • replicates its most popular file to a node, which is far upstream in the usual search path and • Sends all intervening nodes a pointer to the replicate. • Those nodes also may advertise their own files. • Recent pointers crowd out old ones in small, fast caches.

File Replication in Grid Systems • CERN is worldwide grid of variously organized scientific data servers and clients. • Its enormous read-only data files get replicated at every FTP destination. • Replica location services like Globus (p.380ff) use DHT-based Chord to find local contact addresses for requested files.

7. True or False: Unstructured peer-to-peer file systems replicate files uniformly and/or according to their popularity, but structured peer-to-peer file systems replicate popular files upstream in the usual search path and send all intervening nodes a pointer to the replicate. • True. • False.

Fault Tolerance • Replication creates fault-tolerant groups of file servers.

Handling Byzantine Failures • Unbounded communication delays lead to arbitrary Byzantine failures in file systems. • The protocol shown above tolerates k Byzantine failures: • Request—Client sends request to 3k+1 NFS file servers. • Pre-prepare—Master multicasts a proposed sequence number, to properly order the associated operation. • Prepare—Every slave multicasts its acceptance to every other. • Commit—At least 2k others confirm their acceptances. After receiving this “quorum certificate,” all execute the operation. • Reply—At least k+1 file servers send results to client. • The client accepts identical answers from k+1 servers.

High Availability in P2P Systems • Erasure coding saves n-block files as m>n fragments; i.e., with m-n redundant blocks. • They consume less cable bandwidth than sending multiple file copies (see above). • Required file availability for erasure coding, ε = 1 – Σi=m,nai(1-a)n-I and rec = n/m. • Required file availability for replicated files, ε=(1-a)rep • For m = 5, the figure above shows rep/rec > 1.

Security • In spite of the scalability problems with a centralized service, NFS file servers typically use a Kerberos’ authentication service (Fig.9-23, p.413), and do their own authorizations.

Security in NFS • NFS relies upon secure RPCs to secure the channel between client and server (above). • Then NFS uses each file’s access control attributes to verify each client’s rights.

Secure RPCs • NFSv4 is layered on top of the RPSEC_GSS general security framework (see above). • RPSEC_GSS sets up’s secure channels, with message integrity and confidentiality, plus Kerberos authentication. • RPSEC_GSS is layered on top of GSS-API, which is layered on top of a middleware’s RPC system. • LIPKEY authenticates servers with crypto public keys and clients with passwords.

R U O K ? 7. How many Byzantine failures can a group of 22 replicated file servers tolerate? • 8. • 7. • 14. • 16. • None of the above.

Access Control • NFS controls users’ and groups’ file access via each file’s access control list (ACL, p.415). • NFS’ “synchronize” operation allows a server’s own process to directly access a file in any way it wants, so as to improve performance. • NFS also may allow unauthenticated users access (see above).

Decentralized Authentication • To solve scaling problems associated with Kerberos’ centralized authentication service, NFS provides Secure File Systems (SFS, above left). • SFS’ self-certifying pathnames (above right) carry all information necessary to authenticate the file server. A client can authenticate the server by hashing the server’s public key and verifying that it equals the server’s 160-bit host identifier. • Public keys can be obtained anywhere, and certification authorities are widely respected, thus separating key management form file system security. • naming transparency can be provided by Symbolic file links, which in turn are provided by certification authorities.

Secure P2P File-Sharing Systems • Security is complicated in fully decentralized peer-to-peer file-sharing systems. • It must rely on collaboration….

Secure Lookups in DHT-Based Systems • Secure routing requires deals with 3 issues: • Nodes are assigned identifiers in a secure way. • Sybil attack—a node assigns itself many identifying keys. • Eclipse attack—a node takes over all of your nearest neighbors, cutting you off from legitimate file access. • Routing tables are securely maintained. • Attackers tell good nodes to point at malicious nodes. • Lookup requests are securely forwarded between nodes. • Attackers can derail a message traversing a single route.

Distributed File Systems