Lecture 10 – Distributed File Systems

[10차시] Lecture 10 – Distributed File Systems 염익준 (yeom@cs.kaist.ac.kr)

Contents • NFS Optimization • Andrew File System • Recent Advanced in File Service

NFS Optimization - Server Caching • Similar to UNIX file caching for local files: • pages (blocks) from disk are held in a main memory buffer cache until the space is required for newer pages. Read-ahead and delayed-write optimizations. • For local files, writes are deferred to next sync event (30 second intervals) • Works well in local context, where files are always accessed through the local cache, but in the remote case it doesn't offer necessary synchronization guarantees to clients. • NFS v3 servers offers two strategies for updating the disk: • write-through - altered pages are written to disk as soon as they are received at the server. When a write() RPC returns, the NFS client knows that the page is on the disk. • delayed commit - pages are held only in the cache until a commit() call is received for the relevant file. This is the default mode used by NFS v3 clients. A commit() is issued by the client whenever a file is closed.

NFS Optimization - Client Caching • Server caching does nothing to reduce RPC traffic between client and server • further optimization is essential to reduce server load in large networks • NFS client module caches the results of read, write, getattr, lookup and readdir operations • synchronization of file contents is not guaranteed when two or more clients are sharing the same file. • Timestamp-based validity check • reduces inconsistency, but doesn't eliminate it • validity condition for cache entries at the client: (T - Tc < t) v (Tmclient = Tmserver) • t is configurable (per file) but is typically set to 3 seconds for files and 30 secs. for directories • it remains difficult to write distributed applications that share files with NFS t freshness interval Tc time when cache entry was last validated Tm time when block was last updated at server T current time

Other NFS Optimizations • Sun RPC runs over UDP by default (can use TCP if required) • Uses UNIX BSD Fast File System with 8-kbyte blocks • reads() and writes() can be of any size (negotiated between client and server) • the guaranteed freshness interval t is set adaptively for individual files to reduce gettattr() calls needed to update Tm • file attribute information (including Tm) is piggybacked in replies to all file requests

NFS Performance • Early measurements (1987) established that: • write() operations are responsible for only 5% of server calls in typical UNIX environments • hence write-through at server is acceptable • lookup() accounts for 50% of operations • More recent measurements (1993) show high performance: • 1 x 450 MHz Pentium III: > 5000 server ops/sec, < 4 millisec. average latency • 24 x 450 MHz IBM RS64: > 29,000 server ops/sec, < 4 millisec. average latency • Provides a good solution for many environments including: • large networks of UNIX and PC clients • multiple web server installations sharing a single file store

NFS Summary (1/2) • An excellent example of a simple, robust, high-performance distributed service. • Achievement of transparencies: Access: Excellent; the API is the UNIX system call interface for both local and remote files. Location: Not guaranteed but normally achieved; naming of filesystems is controlled by client mount operations, but transparency can be ensured by an appropriate system configuration. Concurrency: Limited but adequate for most purposes; when read-write files are shared concurrently between clients, consistency is not perfect.

NFS Summary (2/2) Achievement of transparencies (continued): Replication: Limited to read-only file systems; for writable files, the SUN Network Information Service (NIS) runs over NFS and is used to replicate essential system files. Failure: Limited but effective; service is suspended if a server fails. Recovery from failures is aided by the simple stateless design. Mobility: Hardly achieved; relocation of filesystems is possible, but requires updates to client configurations. Scaling: Good; filesystems (file groups) may be subdivided and allocated to separate servers. Ultimately, the performance limit is determined by the load on the server holding the most heavily-used filesystem (file group).

The Andrew File System • Similar remote interface to NFS’s. • Two unusual design characteristics: • Whole-file serving: The entire contents of directories and files are transmitted to client computers by AFS servers (in AFS-3, files larger than 64 KB are transferred in 64 KB chunks). • Whole-file caching: Once a copy of a file or a chunk has been transferred to a client computer it is stored in a cache on the local disk. The cache is permanent, surviving reboots of the client computer.

Operation Scenario of AFS • When a user process in a client computer issues an open system call for a file in the shared file space and there is not a current copy of the file in the local cache, the server holding the file is located and is sent a request for a copy of the file. • The copy is stored in the local UNIX file system in the client computer; the copy is then opened and the resulting UNIX file descriptor is returned to the client. • Subsequent read, write and other operations on the file by processes in the client computer are applied to the local copy. • When the process in the client issues a close system call, if the local copy has been updated its contents are sent back to the server. The server updates the file contents and the timestamps on the file.

Assumptions and Observations • For shared files that are infrequently updated and for files that are normally accessed by only a single user, locally cached copies are likely to remain valid for long periods. • Observations from files in UNIX systems: • Files are small: most are less than 10 KB. • Read operations on files are much more common than writes. (about six times more) • Sequential access is common, and random access is rare. • Most files are read and written by only one user. When a file is shared, it is usually only one user who modifies it. • Files are referenced in bursts. • AFS does not fit to implement database system.

Distribution of Processes in the AFS

File Name Space Seen by Clients

System Call Interception in AFS

File System Calls in AFS

Cache Consistency • Using callback promise • What happen if a client receives a file and callback broken message of that file? • Validation request after rebooting. • Data will be lost if multiple clients try to update a file concurrently.

Fetch(fid) -> attr, data Returns the attributes (status) and, optionally, the contents of file identified by the fid and records a callback promise on it. Store(fid, attr, data) Updates the attributes and (optionally) the contents of a specified file. Create() -> fid Creates a new file and records a callback promise on it. Remove(fid) Deletes the specified file. SetLock(fid, mode) Sets a lock on the specified file or directory. The mode of the lock may be shared or exclusive. Locks that are not removed expire after 30 minutes. ReleaseLock(fid) Unlocks the specified file or directory. RemoveCallback(fid) Informs server that a Venus process has flushed a file from its cache. BreakCallback(fid) This call is made by a Vice server to a Venus process. It cancels the callback promise on the relevant file. The Main Components of the Vice Service Interface

Other Aspects • UNIX kernel modifications – intercepting file system call. • Location database – each server contains a copy of a fully replicated location database giving a mapping of volume names to servers. • Read-only replicas • Bulk transfers – using 64 KB chunks. • Partial file caching – from V.3, allowing file data transferred and cached in 64 KB blocks. • Performance • whole-file caching and the callback protocol reduce loads on the server drastically. • ex.) a server load of 40% was measured with eighteen client nodes running a standard benchmark against a load of 100% for NFS running the same benchmark.

NFS Enhancement - WebNFS • WebNFS - NFS server implements a web-like service on a well-known port. Requests use a 'public file handle' and a pathname-capable variant of lookup(). Enables applications to access NFS servers directly, e.g. to read a portion of a large file.

NFS Enhancement -Achieving One-Copy Update Semantics • Stateless server architecture improves robustness, but precludes the achievement of precise one-copy update semantics and the use of callbacks. • Spritely NFS, NQNFS (Not Quit NFS) • Call open() with an operation mode whenever a local user-level process opens a file. • Upon receiving a open(), the server checks the open files table. • If the open specifies write mode, no more write open is accepted, and other clients opening the file with read are informed to invalidate any locally cached of the file. • If the open specifies read mode, the server sends a callback message to any client that is writing, instructing it to stop caching. • Performance was improved by reduction in gettattr() traffic.

Improvements in Disk Storage Organization • RAID - improves performance and reliability by striping data redundantly across several disk drives • Log-structured file storage - updated pages are stored contiguously in memory and committed to disk in large contiguous blocks (~ 1 Mbyte). File maps are modified whenever an update occurs. Garbage collection to recover disk space.

New Design Approaches • Distribute file data across several servers • Exploits high-speed networks (ATM, Gigabit Ethernet) • Layered approach, lowest level is like a 'distributed virtual disk' • 'Serverless' architecture • Exploits processing and disk resources in all available network nodes • Service is distributed at the level of individual files • Examples: • xFS: Experimental implementation demonstrated a substantial performance gain over NFS and AFS • Frangipani: Performance similar to local UNIX file access • Peer-to-peer systems: Napster, OceanStore (UCB), Farsite (MSR), Publius (AT&T research)

Project 2 – FTP Proxy Server • Implement a FTP proxy server which hides individual FTP servers from clients. FTP server Proxy Server FTP server FTP client FTP server

Directory Tree home directory (at proxy server) App Movie Music files and directory files and directory Music Backup (hidden) files and directory Movie Backup (hidden) App Backup (hidden)

System Requirements • Server management • handling join, leave and failure. • Client management • individual user’s permission and priority management • managing access list based on host IP • Connection management • switching connection in case of a server failure • limiting bandwidth based on user’s priority

Lecture 10 – Distributed File Systems

Lecture 10 – Distributed File Systems

Presentation Transcript

Distributed File Systems

Distributed File Systems

Distributed File Systems

Map-Reduce and Its Children

Distributed File Systems

Distributed File Systems

Distributed File Systems

DISTRIBUTED FILE SYSTEMS

Other File Systems: NFS and GFS

Lecture 23: Distributed-File Systems (Chapter 17)

Other File Systems: LFS, NFS, and AFS

Chapter 17: Distributed-File Systems

Chapter 18 – Distributed Systems and Web Services

Team CMD Distributed Systems Team Report 1 12/20/06

Distributed File Systems

Chapter 17 Distributed File Systems By: Amar Deo(300532107) Ankur Patel(300873058)

Distributed FS, Continued

Distributed File Systems

Distributed File Systems

Distributed Systems Course Distributed File Systems