280 likes | 395 Views
Languages for Distributed Systems. Critical regions Transactions Fault-tolerance Automata. Languages vs. Distributed Systems. Goals of programming language design Make it easy to program Make it obvious that a program is correct Design the language for the problem
E N D
Languages for Distributed Systems • Critical regions • Transactions • Fault-tolerance • Automata Computing Systems
Languages vs. Distributed Systems • Goals of programming language design • Make it easy to program • Make it obvious that a program is correct • Design the language for the problem • Programs should be portable • Programs should be easy to maintain • Goals of distributed systems • Performance • Availability, reliability, fault-tolerance Computing Systems
Principles of language design • The language should provide a logical basis for design • The compiler should worry about the details • In particular, the program should not include architecture-specific details Computing Systems
Sequential programming • C, Java, Pascal, FORTRAN, ... • Strictly-ordered execution • Well-defined memory/resource model • Side-effects, weak compilers, make it difficult to write code that is correct • Compiler optimizations are hard Computing Systems
Functional programming • ML, Lisp, Haskell, ... (without side-effects) • Execution order does not matter • Well-defined memory/resource model • Easier to get correct code • Some data structures are hard to implement (e.g. splay trees) • Compiler optimizations are relatively easy Computing Systems
Issues • How do we exploit parallelism without making the programs obscure? • SIMD machines Computing Systems
Dataflow machines Computing Systems
Dataflow • Dataflow graph can be constructed from a serial program • Processor partitioning tends to require fine-grain parallelism • Requires programming to draw dataflow graph explicitly Computing Systems
Traditional path • Basic abstraction: processes • A process is a program in execution • Logically, it has its own machine, including a CPU, infinite memory, and devices • A thread is a process that shares the same address space as another process Computing Systems
Threads Computing Systems
Building a file-server • Single-threaded server • Server waits for a request • When a request is received, the server performs the operation, then returns the result • Performance is poor, because server is idle during file operations • Multi-threaded server • Server contains several worker threads, and a dispatcher • When a request arrives, the dispatcher assigns the job to a worker thread, then sleeps Computing Systems
Threaded file server Computing Systems
State-machine file server • Server is a reactive state machine Computing Systems
Advantages of threads • Threads provide parallelism • They maintain an illusion of sequential control • System calls are blocking • Special handling is needed only for inter-thread communication Computing Systems
Implementing threads • User-space threads • Kernel is unaware of threading • Runtime takes care of switching between threads • When a thread calls a blocking procedure (like a system call) • It calls a wrapper in the runtime • If the call is going to block, • the runtime saves the registers • loads the registers and stack pointer for another thread • Fast Computing Systems
User-space threads • Advantages • Fast • Each process can have its own scheduler • Disadvantages • Kernel has to provide a mechanism for non-blocking system calls • OR, all calls have to be checked with a select(2) operation • Page faults block all threads • Threads must voluntarily relinquish control • Busy-waiting is not allowed Computing Systems
Kernel-space threads • Kernel provides hooks for all thread calls (creation, destruction, semaphores, etc) • Each thread has a Process Control Block • Advantages • System calls do not need to be changed • Spin locks are ok • Page faults are ok • Disadvantages • Thread operations are expensive Computing Systems
Distributed Mutual Exclusion • Example: account transfer • Withdraw $100 from Caltech • Deposit $100 into Jason’s account • Caltech wants at-most-once semantics • Jason wants at-least-once semantics • To prevent conflicts, transfer function should be a critical region Computing Systems
Centralized mutual exclusion Computing Systems
Problems with centralized scheme • Coordinator may fail • Processes can’t normally distinguish between a blocked request and a dead coordinator • Coordinator can become a performance bottleneck Computing Systems
Distributed mutex algorithms • Lamport (1978) • Ricart and Agrawala (1981) • Use totally-ordered group communication • REQUEST: • send REQUEST to all group members • the first one that arrives wins • RELEASE: • send RELEASE to all group members Computing Systems
Distributed mutex • Mutual exclusion is guaranteed by total order • O(n) messages per entry • Problem: instead of 1 bottleneck, there are now n • What if the process in the critical region fails? Computing Systems
Atomic transactions • Mutual exclusion is a low-level concept, like message-passing, and semaphores • We want a higher-level abstraction that makes it easier to write and reason about programs • Atomic transactions (from the business world) • Dingbat corporation needs widgets, they approach US Widget for a quote on 100,000 10cm purple widgets for June • US Widget offers 100,000 4in fuschia widgets for delivery in December • After negotiation, they agree on 3 959/1024 inch violet widgets for delivery in August Computing Systems
Atomic transactions • Jason wants to transfer money from BofA to Wells Fargo • Withdraw ($10000, BofA.jyh) • Charter cuts the cable connection • Deposit ($10000, WF.jyh) • An atomic transaction would solve the problem • Either: both operations complete, or neither completes Computing Systems
Transaction primitives • e ::= atomic { e} | abort | read | write | e1; e2 | ... • Example, reserve flight from LAX to Malindi Computing Systems
Properties of transactions • Atomic: to the outside world, the transaction is indivisible • Consistent: the transaction preserves system invariants • Isolated: two transactions do not interfere • Durable: once a transaction commits, the changes are permanent • (ACID) Computing Systems
Properties • Atomic • Suppose a transaction starts writing to a file that had 10 bytes in it • Other processes will not see the additional data until the transaction commits • Consistent • The transaction should preserve properties • Conservation of money: the invariant may be violated during the transaction, but the violation is not visible outside the transaction Computing Systems
Properties • Isolated (serializable) • If two transactions are running concurrently, the final result looks as if the two operations happened in some order Computing Systems