Distributed File Systems

Distributed File Systems By Pravin D'Souza

What is a Distributed File System?? • Allows transparent access to remote files over a network. • Examples: • Network File System (NFS) by Sun Microsystems. • Remote File Sharing System (RFS) from AT&T. • Andrew File System (AFS) from CMU. • Centralized File System Vs. Distributed File System

Properties of Distributed File Systems • Network transparency. • Location transparency. • Location independence. • User mobility. • Fault tolerance. • Scalability. • File Mobility.

Design Considerations • Different Distributed File Systems can be compared according to how they deal with these issues: • Name Space. • Stateful or stateless operation. • Semantics of sharing. • Remote access methods.

Network File Systems (NFS) • The architecture is based on the Client-Server model. • Clients and servers communicate via Remote Procedure Calls (RPC). • An NFS server exports one or more filesystems. • Clients mount such a file system. • - Hard and Soft mounts. • e.g: mount –t nfs nfssrv:/usr /usr

Design Goals • NFS should not be restricted to UNIX. • The protocol should not be dependent on any particular hardware. • Simple recovery mechanisms. • Remote files should be accessible transparently. • NFS performance should be comparable to that of local disk. • Implementation must be transport independent. • Unix file system semantics must be maintained for UNIX clients.

NFS Components • The NFS protocol • The Remote Procedure Call (RPC) protocol. • The Extended Data Representation (XDR). • The NFS Server Code. • The NFS Client Code. • The Mount protocol. • Daemon processes. • The Network Lock Manager (NLM).

NFS Implementation • Control flow: When a process on the client makes a system call that operates on an NFS file, the file-system independent code identifies the vnode of the file and invokes the relevant Vnode operation. • File handles: The NFS protocol associates an object called the file handle with each file or directory. • The mount operation: • Pathname Lookup

UNIX Semantics • Open file permissions. • Deletion of open files. • Reads and writes. • NFS Performance • Performance Bottlenecks. • Client-Side Caching. • Deferral of writes. • The Retransmissions Cache.

Remote File Sharing (RFS) • Introduced by AT&T. • RFS uses the client-server model. • The design goal is to provide transparent access to remote files by preserving UNIX semantics. • RFS is a completely stateful architecture. • RFS uses a reliable, virtual circuit transport service such as TCP/IP. A virtual circuit is established between a client and server during the first mount operation. • Network independence is achieved. • There is a name server.

RFS Implementations • Remote mount: An RFS server can advertise a directory using the advfs system call along with some arguments. • RFS clients and servers: The client can access an RFS file either through its pathname or through a file descriptor. • Crash Recovery: Stateful systems need elaborate crash recovery mechanisms.

Client-Side Caching: Client caching is activated at mount time. The cache is strictly write through and the data is written to server immediately after the local cache copy is written to and thus the consistency is maintained. Cache Consistency: Any modification of a file, either by user or client, invalidates the cached copy on all other clients.

Andrew File System (AFS) • This is a distributed file system capable of scaling to thousands of users like in a university campus. • Developed by Carnegie-Melon University and IBM. • AFS is UNIX compatible. • It provides a uniform, location independent name space for shared files. • AFS is fault tolerant in case a server fails. • AFS provides security without trusting clients or n/w. • The performance should be comparable to time-sharing system.

The Andrew File System Scalable Architecture: Storage and Name space organization: Session semantics:

AFS Implementation Caching and Consistency: The cache manager implements the vnode operations for AFS files on clients. Pathname Lookup: This is a CPU intensive operation and AFS handles it directly on clients. Security: AFS considers the collection of servers as the boundary of secutriy. It uses the Kerberos authentication system.

The DCE Distributed File System • In 1989 the Transarc Corporation took over the development and production of AFS. • DFS is similar to AFS in several respects. It improves upon AFS in following ways: • It allows a single machine to be both a server and client. • It provides stronger, UNIX like sharing semantics and consistency guarantees. • It allows greater interoperability with other file systems.

Questions??

Distributed File Systems