1 / 32

UNIX Internals – the New Frontiers

UNIX Internals – the New Frontiers. Distributed File Systems. Difference between DOS and DFS. Distributed OS looks like a centralized OS, but runs simultaneously on multiple machines. It may provide a FS shared by all its host machines.

demeter
Download Presentation

UNIX Internals – the New Frontiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UNIX Internals – the New Frontiers Distributed File Systems

  2. Difference between DOS and DFS • Distributed OS looks like a centralized OS, but runs simultaneously on multiple machines. It may provide a FS shared by all its host machines. • Distributed FS is a software layer that manages communication between conventional operating systems and file systems

  3. General Characteristics of DFS • Network transparency • Location transparency & Location independence • User Mobility • Fault tolerance • Scalability • File mobility

  4. Design Considerations • Name Space • Stateful or stateless • Semantics of sharing • UNIX semantics • Session semantics • Remote access method

  5. Network File System(NFS) • Based on Client-server model • Communicate via remote procedure call

  6. User Perspective • An NFS server exports one or more file systems • Hard mount: must get a reply • Soft mount: returns an error • Spongy mount: hard for mount, soft for I/O • Commands: • mount –t nfs nfssrv:/usr /usr • mount –t nfs nfssrv:/usr/u1 /u1 • mount –t nfs nfssrv:/usr /users • mount –t nfs nfssrv:/usr/local /usr/local

  7. Design goals • Not restricted to UNIX • Not be dependent on any hardware • Simple recovery mechanisms • To access remote files transparently • UNIX semantics • NFS performance must be comparable to that of a local disk • Transport-independent

  8. NFS components • NFS protocol • RPC protocol • XDR(Extended Data Representation) • NFS server code • NFS client code • Mount protocol • Daemon processes (nfsd, mountd,biod) • NLM(Network Lock Manager)& NSM(Network Status Monitor)

  9. Statelessness • Each request is independent • It makes crash recovery simple • Client crash • Server crash • Problem: • It must commit all modifications to stable storage before replying to a request.

  10. 10.4 The protocol suite • Why XDR? • Differences among internal representation of data elements: • Order, sizes of types. • Opaque (byte stream) • Typed • Little-endian • Big-endian

  11. XDR • Integers • 32 bits, (0 byte leftmost - most significant), (signed integers - 2’s compliment) • Variable-length opaque data • Length(4B),data is NULL padded • Strings • Length(4B), ASCII string, NULL padded • Arrays • size(4B),same type of data • Structures • Natural order

  12. RPC • Specify the format of communications between the client and the server. • SUN RPC: synchronous requests only. • Implemented on UDP/IP. • Authentication to identify callers • AUTH _NULL, AUTH _UNIX, AUTH_SHORT, AUTH _DES, and AUTH _KERB • RPC language compiler: rpcgen

  13. 10.5 NFS Implementation • Control Flow • Vnode • Rnode

  14. File Handle • Assign a file handle for lookup, create or mkdir. • Subsequent I/O operations will use it. • A file handle =Opaque 32B object =<file system ID, inode number, generation number> • Generation number is used to check if the file is not obsolete (its inode is allocated to another file)

  15. The mount operation • nfs_mount(): • send RPC request with argument of pathname • Mountd daemon translate • Checks • Reply success with a file handle • Initialize vfs, records name, address • Allocate rnode & vnode • Server must check access rights on each request

  16. Pathname Lookup • Client: • Initiate lookup during open, create & stat • From current or root directory, proceeds one component at a time • Send request if it is a NFS directory • Server • From file handle ->FS ID->vfs->VGET-> vnode ->VOP_LOOKUP->vnode & pointer • VOP_GETATTR->VOP_FID-> file handle • Reply message= status+file handle+file attributes • Client: • Gets the reply, allocates rnode+vnode, copy info and proceeds to search for the next component

  17. 10.6 UNIX Semantics • NFS leads to a few incompatibilities with UNIX because of stateless. • Open file permission • UNIX checks for open • NFS checks for each read and write • In NFS, the server always allows the owner of the file to read or write the file. • Write to the write-protected? • Save attributes containing the file permission when open

  18. Deletion of open files • The server has no ideas about the open file. • The clients renames the file to be deleted. • Delete it when closing it • Delete on different machines?

  19. Reads and Writes • UNIX locks the vnode at the start of I/O • NFS clients can lock the vnode on the same machine. • NFS offers no protection against overlapping I/O requests. • Using NLM(Network Lock Manager) protocol is only advisory.

  20. 10.7 NFS Performance • Bottlenecks • Writes must be committed to stable storage • Fetching of file attributes requires one RPC call per file • Processing retransmitted requests adds to the load on the server

  21. Client-side caching • Caching both blocks and file attributes • To avoid invalid data • Keep an expiry time in the kernel • 60 seconds for rechecking the modified time • Reduces but not eliminates the problem

  22. Deferral of writes • Asynchronous writes for full blocks • Delayed writes for partial blocks • Flush delayed writes when closing or 30 seconds by biod daemon • Server uses NVRAM buffer, flushes the buffer to disk • Write-gathering: • Wait, process >1 writes to one file and reply for each • The server process gathered write requests

  23. The retransmissions cache • Idempotent • Nonidempotent • Problem: • Retransmissions (xid) cache (server): • Check xid, procedure number, & client ID • Check cache only when failure • Remove request • Remove, sends reply success, but lost • Client restransmit remove • Server processes remove request • Remove error, sends remove failure • Client receives the error message

  24. New implementation • Caches all requests • Check xid, procedure number, client ID, state field & timestamp • If request in progress, discard; if done, discards if timestamp shows the request is in the throwaway window(3-6s) • Otherwise processes request if idempotent; • For nonidempotent, checks the file if modified, if not - send success; otherwise, retry it.

  25. 10.9 NFS Security • NFS Access Control • On mount and request • By an exports list • Mount: checks the list, denies the ineligible • Request: authentication information, AUTH_UNIX form(UID,GID) • Loophole: a imposter can use <UID,GID> to access the files of others

  26. UID Remapping • A translation map for each client. • Same UID may map to different UID on the server • Nobody if does not match in the map • Implemented at RPC level • Implemented at NFS level • Merging the map and /etc/exports file

  27. Root Remapping • Map the super user to nobody • Limit the super user of the client to access files on the server • The UNIX framework is designed for an isolated, multi-user environment. The users trust each other.

  28. 10.10 NFS Version 3 • Commit request • Client writes, the kernel sends asynchronous write • Server saves to local cache, replies immediately • Client holds the data copy until the process closes the file and sends commit request • Server flushes data to disk • file length: • From 32 bits(4GB) to 64 bits(234 GB) • READDIRPLUS =(LOOKUP+GETATTR) • Returns names, file handles, file attributes

  29. Other DFS • The Andrew File System (10.15 – 10.17) • The DCE Distributed File System (10.18 – 10.18.5)

More Related