150 likes | 296 Views
This overview delves into the distinctions between centralized and distributed file systems, focusing on systems such as NFS, AFS, CIFS, and modern alternatives like Coda and JetFile. Centralized systems centralize storage on one server, posing challenges in scalability, performance, and reliability. In contrast, distributed systems store data across multiple servers, enhancing scalability and cooperative caching capabilities. We explore various mechanisms for cache coherency and the complexities of versioning and data replication. The paper serves as a guide for understanding contemporary file system architectures and their applications.
E N D
Truly Distributed File Systems Paul Timmins CS 535
Centralized Network File Systems • NFS, AFS, CIFS provide distributed access to a centralized file system. • Primary storage resides on one server. • Data may be redundant and possibly replicated. • Metadata is maintained by server
Problems in Centralized Systems • Scalability: Each client adds a fixed overhead • Performance: Server becomes the bottleneck • Reliability: Data resides with a single server
Centralized Network File Systems Server Client Client Client Client
Distributed File Systems • Physical data resides on the disks of multiple servers • Metadata is maintained by multiple servers, although one may be elected as master • Cooperative Caching • Disconnected Operation • Cache Coherency
xFS • Clusters of clients • Log-structured File System, striped with parity • Block level exclusive locks for writes • Server sends cache invalidations • Cooperative Caching, clients serve clients • Ownership based
JetFile • Versioning file system, allows simultaneous writes • Clients can serve data to other clients • Modified data can be cached at the client • Invalidation notices are sent from client to multiple clients • Coherency is dependant on multicast reliability
Coda • Replicated (mirrored) servers, with one elected as lock manager. • Servers replicate on file access • Versioning file system • Disconnected operation allowed • Simultaneous or divergent writes which conflict and cannot be resolved automatically must be manually resolved
Cache Invalidation • Invalidations are transmitted to clients upon changes, as opposed to NFS-style validate upon request • Coda and xFS track clients caches, which consumes some server resources • JetFile uses multicast from one client to all other clients to invalidate their cache
Cooperative Caching Server 2. Client B requests Foo, but is referred to Client A 1. Client A requests and reads Foo Client A Client B 3. Client B reads Foo from Client A
Cache Coherency Most: xFS (Ownership Based) AFS (Stateful server invalidation) JetFile (Multicast client invalidation) NFS (Stateless) Least: Coda (Disconnected)
Applications • WWW Replication • FTP Sites • Network Computers • Corporate file system
Current Status • xFS and Berkeley NOW project is dead, unclear whether any continued xFS work will be done • JetFile seems to be under continued research, no significant new work • Coda is in use by developers at CMU. No significant new research, focus is on stabilizing. Recent work includes a Solaris port and bug fixing.
Summary • Naming: location independent • Migration: Moving the location of data is transparent in all cases • Directories: Handled the same as in local file • Sharing Semantics: JetFile and Coda are unix-style, xFS is session-based • Caching: Clients cache, and can serve from their cache • Locking: JetFile and Coda don’t allow locking, xFS locks
Summary • Replication/Reliability: All provide server replication • Scalability: Should scale to thousands and tens of thousands • Homogeneity: Not required • File system interface: Unix style semantics • Security: xFS provides encryption, but marginal authentication, Jetfile provides none, Coda provides authentication and encryption • State/Stateless: xFS and Coda are statefull, JetFile is stateless but still sends cache invalidation