1 / 13

Introduction to DFS

Introduction to DFS. Distributed File Systems. A file system whose clients, servers and storage devices are dispersed among the machines of a distributed system File system operations have to be carried out over the network A good DFS should ensure transparency

alyssa
Download Presentation

Introduction to DFS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to DFS

  2. Distributed File Systems • A file system whose clients, servers and storage devices are dispersed among the machines of a distributed system • File system operations have to be carried out over the network • A good DFS should ensure transparency • Clients should have the look and feel of a conventional file system

  3. Naming and Transparency • Mapping between the logical and physical objects • Location Transparency – Name and physical storage location have no relationship • Location independence – Name and physical storage are independent • Name need not be changed if physical location is changed • Location independent files are essentially logical data containers • Location transparency hides the association b/w names and physical storage

  4. Naming Schemes • Combination of host name and local name • Local name is a path similar to Unix • Neither transparent nor independent • Attaching remote directories to the local directory • Popularized by Sun’s NFS • Appears as a coherent directory tree • Globally unique names • Truly transparent • Global naming structure spans all names • Difficult to achieve due to special files

  5. Implementing Naming Schemes • Transparent naming requires mapping between names and their associated locations • Aggregating files into components for scalability and manageability • Hierarchical directory trees • Replication and caching • Maintaining consistency of cached view • Location independent file identifiers

  6. Accessing Remote Files • Needs network data transfer • Remote service mechanism • Remote procedure call • Caching for improved performance

  7. Caching • Idea is fetch once, use multiple times • If requested data is not available, get it from server • Store fetched data • Perform access on local data • Replace data when cache becomes full • One master copy at the server, several secondary copies at clients • Granularity – File blocks to entire file

  8. Cache Location • Main memory • Workstations can be diskless • Faster access • Technology trends memory accesses becoming faster • Server caches will be in main memory – code reusability • Local disks • Reliability via persistence • Hybrid schemes • Best of both worlds

  9. Cache Update Policy • Policy regarding when the modified data is reflected on the master copy • Can have significant impact on the performance • Write through policy • All writes are reflected immediately on the master copy • Blocking • Delayed writes • Write on flush • Periodic writes • Write on close

  10. Ensuring consistency • Ensuring that data being read is consistent with master copy • Client initiated approach • Clients validates with server whether its data is up-to-date • Frequency of validation is the main issue • Check on first access • Check on every access • Periodic checking

  11. Server Initiated Approaches • Server records the files each client is accessing • Detects potential inconsistency and notifies clients • Conflicts occur when at least 2 clients cache and one is writing • Invalidation/Update based mechanisms • Session semantics • Consistency enforced upon file closing • Unix semantics • Consistency enforced upon write

  12. Why or Why not Caching • Locality of accesses • Gains in performance and scalability • Big chunks of data lead to lesser overheads • Disk accesses can be optimized for larger chunks of data • Consistency maintenance is the cost • Memory/disk space requirements at clients

  13. Stateful vs. Stateless Servers • Stateful servers maintain information about files being accessed by clients • Clients are given connection ids, which acts as index into inode tables • Performance gains – Prefetching file blocks • Stateless servers maintain no state • Each request is self-contained • Reliability is the issue !!!

More Related