Distributed Systems Implementation Issues & Solutions

NFS & Distributed Systems Issues Vivek Pai Dec 12, 2002

The Next Project • Behavioral spec • Implementation up to you • Can assume max of 32 procs/threads • Use a simple counter to implement simple counts • I may release a tool to test easier • But feel to use ApacheBench, etc

Behavioral Spec The following behavioral spec is important • If there aren’t enough free processes/threads, the server should spawn one per second • If there are too many free, one should be killed per second • This should not depend on any other activity in the system

Caching Mmap • Always use mmap • Keep cache of active & inactive maps • Total cache size in KB should be limited by command-line argument • Can only exceed this limit if all mappings are active

Man Pages You May Like • Mmap, munmap • Man –k pthread • Flock/lockf • Sleep • Signal • Alarm

Being A Good User • Do not fork wildly • Try to test on non-shared system

Imagine The Following • Everyone has a desktop machine • Each machine has a user • Each user has a home directory • What problems arise? • Can’t move between machines • Can’t easily share files with others • How does this data get backed up?

Was It Always Like This? • No • Think mainframes: • Big, centralized box • All disks attached • Programs ran on box • Only terminals/monitors on each desk

How Did We Get Here? • Mainframe killers advocated little boxes • Lots of little boxes are a distributed system • Distributed systems introduce new problems

Why Use Little Boxes? • Little boxes are cheap • Easier to order a PC than a mainframe • Little boxes are disposable • No need for a maintenance contract • Economy of scale • Design cost amortized over more units

Were Minis Immune? • Minicomputers were “department”-sized versus “company”-sized • Most information not shared among everyone • Administrator per department OK • Shared resources only within department OK

Why Not Just Shared Disk? • Centralized storage • Easier administration/backup • Better use of capacity • Easier to build large filesystem cache • Easier to provide AC/power • Problem: compare bandwidth • 10 Mbit/sec Ethernet at the time • Switched versus shared irrelevant

New Problem • Single point of failure • Means everything depends on this item • In other cases, duplication helps • Common failures = reboot • But all information (state) lost • All clients would have to be told • We’d need to keep track of all clients • On stable storage!

Toward Statelessness • Make server as dumb as possible • Shift burdens to client-side • Client failure only harms that client • Each operation is self-contained • Repeating operations permissible • Idempotent – repeating causes no change

Idempotency • Regular Unix system call • write(fd, buf, size) • Writes size bytes at current position, moves position forward by size • Idempotent version • pwrite(fd, buf, size, offset) • Idempotent operations in NFS hidden from user programs

Distributed Caching • Local filesystems have caches • Use caches to offload network traffic • Same object replicated in many caches • No problem for reads • What happens on write/update? • Multiple different copies of data? • What happens if it’s metadata?

Distributed Write Problem • Possible approaches • Disallow caching on writes • What about emacs? • Disallow caching of shared files • What happens for really big files? • Disallow caching of metadata writes • What disk blocks does OS care about?

Sun’s Write Philosophy • File block write sharing not an issue • Very few programs do it • Correctness depends on program • Reduce window of opportunity • Flush dirty blocks periodically • Flush can be asynchronous

Metadata Operations • Performed synchronously at server • Must be reflected to disk • Why: stability • Overhead: disk op + network • Can we speed up synchronous ops?

New Statelessness Problems • Stale file handle problem • cd ~vivek/temp1/temp in window A • rm –r ~vivek/temp1 in window B • “ls” in window A • Stale inode problem • Machine A gets file for read • Filesystem reformatted by admin • Machine A modifies file, tries to write

What Slows Down Servers • Network overhead • Disk DMA in 4KB pieces • Network processing in 1500 byte packets + manipulation • Multiple CPUs • Synchronous operations • Nonvolatile memory + recovery

Distributed Systems Implementation Issues & Solutions