1 / 50

Distributed File Systems

Distributed File Systems. Yih-Kuen Tsay Dept. of Information Management National Taiwan University. Purposes of a Distributed File System. Sharing of storage and information across a network Convenience (and efficiency) of a conventional file system

caden
Download Presentation

Distributed File Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed File Systems Yih-Kuen Tsay Dept. of Information Management National Taiwan University Distributed File Systems [2006/11/06] -- 1

  2. Purposes of a Distributed File System • Sharing of storage and information across a network • Convenience (and efficiency) of a conventional file system • Persistent storage that most other services (e.g., Web servers) need Distributed File Systems [2006/11/06] -- 2

  3. Properties of Storage Systems Sharing Persis- Distributed Consistency Example tence cache/replicas maintenance Main memory 1 RAM 1 File system UNIX file system Distributed file system Sun NFS Web server Web Distributed shared memory Ivy (DSM, Ch. 18) Remote objects (RMI/ORB) CORBA 1 1 Persistent object store CORBA Persistent Object Service 2 Peer-to-peer storage system OceanStore (Ch. 10) Types of consistency: 1: strict one-copy. : slightly weaker guarantees. 2: considerably weaker guarantees. Other properties include availability, timing guarantees, etc. Distributed File Systems [2006/11/06] -- 3 Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition.

  4. Files • Files are an abstraction of permanent storage. • A file is typically defined as a sequence of similar-sized data items along with a set of attributes. • A directory is a file that provides a mapping from text names to internal file identifiers. Distributed File Systems [2006/11/06] -- 4

  5. File Attributes Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 5

  6. File Systems • Responsible for the (a) organization, (b) storage, (c) retrieval, (d) naming, (e) sharing, and (f) protection of files. • Provide a set of programming operations that characterize the file abstraction, particularly operations to read and write subsequences of data items beginning at any point of a file. Distributed File Systems [2006/11/06] -- 6

  7. File System Modules A basic distributed file system implements all of the above plus modules for client-server communication and distributed naming and location of files. Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 7

  8. UNIX File Operations Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 8

  9. Distributed File System Requirements • Transparency: access, location, mobility, performance, and scaling transparency. • Concurrency (and Consistency) • Replication/Caching (and Consistency) • Hardware/operating system heterogeneity • Fault-Tolerance • Security (Access Control, Authentication) • Efficiency Distributed File Systems [2006/11/06] -- 9

  10. A File Service Architecture Note: The modules communicate with one another by remote procedure calls. Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 10

  11. File Service Components • Flat file service: implementing operations on the contents of files, which are referred to by unique file identifiers (UFIDs) • Directory service: mapping text names of files (including directories) to their UFIDs • Client module: integrating and extending the previous two services under a single application programming interface * Why is this structure more open and configurable? Distributed File Systems [2006/11/06] -- 11

  12. Flat File Service Operations Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 12

  13. Difference from UNIX • Immediate access to files using UFIDs (without open or close) • Read or write starts at the position indicated by a parameter • All operations, except create, are repeatable • Allows a stateless implementation Distributed File Systems [2006/11/06] -- 13

  14. Access Control • Conventional access rights checks (at open calls) not feasible • Two ‘stateless’ approaches: * Capability (by manipulating the UFID) * User identity sent with every request (adopted in NFS and AFS) • Main problem: forged requests; some authentication mechanism is needed Distributed File Systems [2006/11/06] -- 14

  15. Capabilities and UFIDs A capability is a binary value that acts as an access key; it can be encoded in the UFID. • Basic construction of a UFID: file group id + file number + random number • Additional field: permissions • Additional field: encryption of the permission field Distributed File Systems [2006/11/06] -- 15

  16. Directory Service Operations Note: Each directory is stored as an ordinary file with a UFID. Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 16

  17. The Network File System (NFS) • Introduced by Sun Microsystems in 1985, now an Internet standard • Runs on top of RPC (RFC 1831) • Implemented on most operating systems • Version described here: UNIX implementation of NFS Version 3 (RFC 1813, June 1995) • Most recent version: NFS Version 4 (RFC 3010, December 2000) Distributed File Systems [2006/11/06] -- 17

  18. NFS Architecture Note: Each computer can act as both a client and a server. Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 18

  19. The Virtual File System Module • Access transparency • File handles (file identifiers): • ‘filesystem indentifier’ + ‘i-node number’ + ‘i-node generation number’ • One VFS structure for each mounted filesystem • relates a remote filesystem (identified by its file handle obtained at mount time) to a local directory on which it is mounted • One v-node per open file • indicates whether a file is local (i-node) or remote (file handle) Distributed File Systems [2006/11/06] -- 19

  20. The NFS Client Module in UNIX • Integrated with the kernel • Emulates the UNIX file system primitives • A single client module serves all user-level processes • The encryption key for authentication stored in the kernel • Caches file blocks • There is a consistency problem Distributed File Systems [2006/11/06] -- 20

  21. Access Control and Authentication • Stateless servers • The user’s identity checked afresh on each request • Authentication information supplied automatically by the RPC system • Security loophole: the client can modify a RPC call to impersonate any user • An encryption option closes this loophole • Securing NFS with Kerberos • Full authentication done when files are mounted • A server retains the current mounts (including user authentication data) at each client Distributed File Systems [2006/11/06] -- 21

  22. NFS Server Operations Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 22

  23. NFS Server Operations (cont’d) Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 23

  24. Remote File Accesses Note 2: a pathname is resolved to an i-node in an iterative manner using lookup. Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 24

  25. Automatic Mounting • Mount filesystems when they are referenced and unmount them when they are no longer needed • Implementations: automount (later version: autofs)and amd • A simple form of read-only replication can be achieved • Fault tolerance • Load balance Distributed File Systems [2006/11/06] -- 25

  26. Sample File System Information in UNIX saturn:~ 35 % df -k Filesystem kbytes capacity Mounted on /dev/dsk/c0t3d0s0 143903 91% / /dev/dsk/c0t3d0s6 267943 99% /usr /dev/dsk/c0t3d0s3 15383 3% /tmp galaxy:/usr/local.real 4030440 53% /usr/local lucky:/var/mail.real 564648 86% /var/mail cosmos:/home.real/student/xxx 3941760 60% /home/xxx galaxy:/home.real/faculty/yyy 2964512 51% /home/yyy * Note: The output of ‘df -k’ has been edited. Distributed File Systems [2006/11/06] -- 26

  27. Caching – Server Caching • Similar to conventional UNIX’s buffer cache • read-ahead • delayed-write • Extra measures for write operations • delayed-write with the commit operation (default) • write-through (to ensure failure independence) Distributed File Systems [2006/11/06] -- 27

  28. Caching – Client Caching • Caching results of read, write, getattr, lookup, and readdir • Cache validation based on timestamps • last-validated timestamp and freshness interval • last-modified timestamp • Trade-off between consistency and efficiency • Piggybacking of file attribute values • The bio-daemon processes for implementing read-ahead and delayed-write caching at the client side Distributed File Systems [2006/11/06] -- 28

  29. Achievements of NFS • Access and location transparency • Mobility transparency (partially) • Read-only file replication: automatic mounting • Fault-tolerance: stateless servers, automatic mounting • Efficiency: caching of disk blocks (main problem: frequent use of getattr) Nonachievements: scalability, concurrency and consistency, security (Kerberos), ... Distributed File Systems [2006/11/06] -- 29

  30. The Andrew File System (AFS) • Developed at CMU • Current versions: AFS-2, AFS-3 • Compatible with NFS • Main achievement over (older) NFS: better scalability by minimizing client-server communication • Key characteristics: whole-file serving and caching (partial file caching allowed in AFS-3) Distributed File Systems [2006/11/06] -- 30

  31. Observations on UNIX File Usage • Files are mostly small • Read operations are more common • Sequential accesses are more common • Most files are written by one user • Files are referenced in burst Distributed File Systems [2006/11/06] -- 31

  32. AFS Architecture Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 32

  33. AFS File Name Space Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 33

  34. System Call Interception in AFS Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 34

  35. AFS System Calls Implementation Distributed File Systems [2006/11/06] -- 35 Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition.

  36. Cache Consistency • A callback promise is provided when Vice supplies a copy of file to a Venus process • The callback promise stored with the cached copy is in either valid or cancelled state • When Venus handles an open, it checks the cache. Distributed File Systems [2006/11/06] -- 36

  37. The Vice Service Interface Source: Coulouris et al., Distributed Systems: Concepts and Design, Fourth Edition. Distributed File Systems [2006/11/06] -- 37

  38. Enhancements to NFS and AFS • Spritely NFS • add open and close, use callbacks • NQNFS (Not Quite NFS) • use callbacks and leases • WebNFS • allow browsers and other applications to interact with an NFS server directly • NFS Version 4 (RFC 3010, December 2000) • incorporating all of the above and more • DCE/DFS (based on AFS) • use callbacks and write tokens (with a lifetime) Distributed File Systems [2006/11/06] -- 38

  39. New Features of NFS Version 4 • Adoption of the RPCSEC_GSS (RFC 2203) security protocol • Multiple operations in one request • Better migration and replication abilities • A client may query the location(s) of a file system. • Introduction of open and close operations • Lease-based file locking • Callback-based delegation of files Distributed File Systems [2006/11/06] -- 39

  40. New Design Approaches • Background • high-performance storage technology (e.g., RAID) • log-structure file systems (e.g., Sprite, BSD LFS) • high-performance switched networks (e.g., ATM, high-speed Ethernet) • Goals: high scalability and fault-tolerance • Main ideas: distribute file data among many nodes, separate responsibilities, … • Constraints: high level of trust Distributed File Systems [2006/11/06] -- 40

  41. More Recent File System Designs • xFS • Serverless: all data, metadata, and control can be located anywhere in the system; any machine can take over the responsibilities of a failed one • Frangipani • Two-layer structure • the Petal distributed virtual disk system • the Frangipani server module Both designs utilize RAID-style striping, log-structured file storage, etc. Distributed File Systems [2006/11/06] -- 41

  42. Log-based Striping in xFS Source: T.E. Anderson et al., Serverless Network File Systems, ACM TOCS 1996 Distributed File Systems [2006/11/06] -- 42

  43. An xFS Configuration Source: T.E. Anderson et al., Serverless Network File Systems, ACM TOCS 1996 Distributed File Systems [2006/11/06] -- 43

  44. A Frangipani Configuration Distributed File Systems [2006/11/06] -- 44 Source: C.A. Thekkath et al., Frangipani, A Scalable Distributed File System, ACM SOSP 1997

  45. Storage Systems Distributed File Systems [2006/11/06] -- 45 Source: G.A. Gibson and R. van Meter, Network Attached Storage Architecture, CACM, November 2000.

  46. NAS and SAN Note: the difference is disappearing. Distributed File Systems [2006/11/06] -- 46 Source: G.A. Gibson and R. van Meter, Network Attached Storage Architecture, CACM, November 2000.

  47. Bandwith for Disk Access Source: E. Riedel, Storage Systems, Queue, June 2003. Distributed File Systems [2006/11/06] -- 47

  48. Increasing the Bandwith Source: E. Riedel, Storage Systems, Queue, June 2003. Distributed File Systems [2006/11/06] -- 48

  49. Virtualization in SAN Distributed File Systems [2006/11/06] -- 49 Source: E. Riedel, Storage Systems, Queue, June 2003.

  50. Requirements for Storage Systems • Basic requirements: resource consolidation, rapid deployment, central management, convenient backup, high availability, data sharing. • Geographic separation • Security against an increasing risk of unauthorized access • Performance scalable with capacity (accesses per second or megabytes per second) Distributed File Systems [2006/11/06] -- 50

More Related