1 / 22

NFS & Distributed Systems Issues

NFS & Distributed Systems Issues. Vivek Pai Dec 6, 2001. Mechanics. A few words about Project 5 It’s not just another webserver project. The Next Project. Behavioral spec Implementation up to you Can assume max of 128 procs/threads Use a simple counter to implement simple counts

ted
Download Presentation

NFS & Distributed Systems Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NFS & Distributed Systems Issues Vivek Pai Dec 6, 2001

  2. Mechanics • A few words about Project 5 • It’s not just another webserver project

  3. The Next Project • Behavioral spec • Implementation up to you • Can assume max of 128 procs/threads • Use a simple counter to implement simple counts • I may release a tool to test easier

  4. Behavioral Spec The following behavioral spec is important • If there aren’t enough free processes/threads, the server should spawn one per second • If there are too many free, one should be killed per second • This should not depend on any other activity in the system

  5. Caching Mmap • Always use mmap • Keep cache of active & inactive maps • Total cache size in KB should be limited by command-line argument • Can only exceed this limit if all mappings are active

  6. Man Pages You May Like • Mmap, munmap • Man –k pthread • Flock • Sleep • Signal • Alarm

  7. Being A Good User • Do not fork wildly • Try to test on non-shared system

  8. Imagine The Following • Everyone has a desktop machine • Each machine has a user • Each user has a home directory • What problems arise? • Can’t move between machines • Can’t easily share files with others • How does this data get backed up?

  9. Was It Always Like This? • No • Think mainframes: • Big, centralized box • All disks attached • Programs ran on box • Only terminals/monitors on each desk

  10. How Did We Get Here? • Mainframe killers advocated little boxes • Lots of little boxes are a distributed system • Distributed systems introduce new problems

  11. Why Use Little Boxes? • Little boxes are cheap • Easier to order a PC than a mainframe • Little boxes are disposable • No need for a maintenance contract • Economy of scale • Design cost amortized over more units

  12. Were Minis Immune? • Minicomputers were “department”-sized versus “company”-sized • Most information not shared among everyone • Administrator per department OK • Shared resources only within department OK

  13. Why Not Just Shared Disk? • Centralized storage • Easier administration/backup • Better use of capacity • Easier to build large filesystem cache • Easier to provide AC/power • Problem: compare bandwidth • 10 Mbit/sec Ethernet at the time • Switched versus shared irrelevant

  14. New Problem • Single point of failure • Means everything depends on this item • In other cases, duplication helps • Common failures = reboot • But all information (state) lost • All clients would have to be told • We’d need to keep track of all clients • On stable storage!

  15. Toward Statelessness • Make server as dumb as possible • Shift burdens to client-side • Client failure only harms that client • Each operation is self-contained • Repeating operations permissible • Idempotent – repeating causes no change

  16. Idempotency • Regular Unix system call • Write(fd, buf, size) • Writes size bytes at current position, moves position forward by size • Idempotent version • Pwrite(fd, buf, size, offset) • Idempotent operations in NFS hidden from user programs

  17. Distributed Caching • Local filesystems have caches • Use caches to offload network traffic • Same object replicated in many caches • No problem for reads • What happens on write/update? • Multiple different copies of data? • What happens if it’s metadata?

  18. Distributed Write Problem • Possible approaches • Disallow caching on writes • What about emacs? • Disallow caching of shared files • What happens for really big files? • Disallow caching of metadata writes • What disk blocks does OS care about?

  19. Sun’s Write Philosophy • File block write sharing not an issue • Very few programs do it • Correctness depends on program • Reduce window of opportunity • Flush dirty blocks periodically • Flush can be asynchronous

  20. Metadata Operations • Performed synchronously at server • Must be reflected to disk • Why: stability • Overhead: disk op + network • Can we speed up synchronous ops?

  21. New Statelessness Problems • Stale file handle problem • cd ~vivek/temp1/temp in window A • rm –r ~vivek/temp1 in window B • “ls” in window A • Stale inode problem • Machine A gets file for read • Filesystem reformatted by admin • Machine A modifies file, tries to write

  22. What Slows Down Servers • Network overhead • Disk DMA in 4KB pieces • Network processing in 1500 byte packets + manipulation • Multiple CPUs • Synchronous operations • Nonvolatile memory + recovery

More Related