Languages for Distributed Systems

Languages for Distributed Systems • Critical regions • Transactions • Fault-tolerance • Automata Computing Systems

Languages vs. Distributed Systems • Goals of programming language design • Make it easy to program • Make it obvious that a program is correct • Design the language for the problem • Programs should be portable • Programs should be easy to maintain • Goals of distributed systems • Performance • Availability, reliability, fault-tolerance Computing Systems

Principles of language design • The language should provide a logical basis for design • The compiler should worry about the details • In particular, the program should not include architecture-specific details Computing Systems

Sequential programming • C, Java, Pascal, FORTRAN, ... • Strictly-ordered execution • Well-defined memory/resource model • Side-effects, weak compilers, make it difficult to write code that is correct • Compiler optimizations are hard Computing Systems

Functional programming • ML, Lisp, Haskell, ... (without side-effects) • Execution order does not matter • Well-defined memory/resource model • Easier to get correct code • Some data structures are hard to implement (e.g. splay trees) • Compiler optimizations are relatively easy Computing Systems

Issues • How do we exploit parallelism without making the programs obscure? • SIMD machines Computing Systems

Dataflow machines Computing Systems

Dataflow • Dataflow graph can be constructed from a serial program • Processor partitioning tends to require fine-grain parallelism • Requires programming to draw dataflow graph explicitly Computing Systems

Traditional path • Basic abstraction: processes • A process is a program in execution • Logically, it has its own machine, including a CPU, infinite memory, and devices • A thread is a process that shares the same address space as another process Computing Systems

Threads Computing Systems

Building a file-server • Single-threaded server • Server waits for a request • When a request is received, the server performs the operation, then returns the result • Performance is poor, because server is idle during file operations • Multi-threaded server • Server contains several worker threads, and a dispatcher • When a request arrives, the dispatcher assigns the job to a worker thread, then sleeps Computing Systems

Threaded file server Computing Systems

State-machine file server • Server is a reactive state machine Computing Systems

Advantages of threads • Threads provide parallelism • They maintain an illusion of sequential control • System calls are blocking • Special handling is needed only for inter-thread communication Computing Systems

Implementing threads • User-space threads • Kernel is unaware of threading • Runtime takes care of switching between threads • When a thread calls a blocking procedure (like a system call) • It calls a wrapper in the runtime • If the call is going to block, • the runtime saves the registers • loads the registers and stack pointer for another thread • Fast Computing Systems

User-space threads • Advantages • Fast • Each process can have its own scheduler • Disadvantages • Kernel has to provide a mechanism for non-blocking system calls • OR, all calls have to be checked with a select(2) operation • Page faults block all threads • Threads must voluntarily relinquish control • Busy-waiting is not allowed Computing Systems

Kernel-space threads • Kernel provides hooks for all thread calls (creation, destruction, semaphores, etc) • Each thread has a Process Control Block • Advantages • System calls do not need to be changed • Spin locks are ok • Page faults are ok • Disadvantages • Thread operations are expensive Computing Systems

Distributed Mutual Exclusion • Example: account transfer • Withdraw $100 from Caltech • Deposit $100 into Jason’s account • Caltech wants at-most-once semantics • Jason wants at-least-once semantics • To prevent conflicts, transfer function should be a critical region Computing Systems

Centralized mutual exclusion Computing Systems

Problems with centralized scheme • Coordinator may fail • Processes can’t normally distinguish between a blocked request and a dead coordinator • Coordinator can become a performance bottleneck Computing Systems

Distributed mutex algorithms • Lamport (1978) • Ricart and Agrawala (1981) • Use totally-ordered group communication • REQUEST: • send REQUEST to all group members • the first one that arrives wins • RELEASE: • send RELEASE to all group members Computing Systems

Distributed mutex • Mutual exclusion is guaranteed by total order • O(n) messages per entry • Problem: instead of 1 bottleneck, there are now n • What if the process in the critical region fails? Computing Systems

Atomic transactions • Mutual exclusion is a low-level concept, like message-passing, and semaphores • We want a higher-level abstraction that makes it easier to write and reason about programs • Atomic transactions (from the business world) • Dingbat corporation needs widgets, they approach US Widget for a quote on 100,000 10cm purple widgets for June • US Widget offers 100,000 4in fuschia widgets for delivery in December • After negotiation, they agree on 3 959/1024 inch violet widgets for delivery in August Computing Systems

Atomic transactions • Jason wants to transfer money from BofA to Wells Fargo • Withdraw ($10000, BofA.jyh) • Charter cuts the cable connection • Deposit ($10000, WF.jyh) • An atomic transaction would solve the problem • Either: both operations complete, or neither completes Computing Systems

Transaction primitives • e ::= atomic { e} | abort | read | write | e1; e2 | ... • Example, reserve flight from LAX to Malindi Computing Systems

Properties of transactions • Atomic: to the outside world, the transaction is indivisible • Consistent: the transaction preserves system invariants • Isolated: two transactions do not interfere • Durable: once a transaction commits, the changes are permanent • (ACID) Computing Systems

Properties • Atomic • Suppose a transaction starts writing to a file that had 10 bytes in it • Other processes will not see the additional data until the transaction commits • Consistent • The transaction should preserve properties • Conservation of money: the invariant may be violated during the transaction, but the violation is not visible outside the transaction Computing Systems

Properties • Isolated (serializable) • If two transactions are running concurrently, the final result looks as if the two operations happened in some order Computing Systems

Languages for Distributed Systems

Languages for Distributed Systems

Presentation Transcript

Distributed Systems

Distributed Systems

Distributed Systems

Distributed Systems

Distributed Systems

Protocols For Distributed Systems

Technologies for distributed systems

Distributed Systems

Distributed Systems

Distributed Systems: Coordination models and languages

Distributed Systems Course Distributed Multimedia Systems

Architectures for Distributed Systems

Distributed Systems Course Distributed File Systems

Distributed Systems

Distributed Systems

Architectures for Distributed Systems

Networks for Distributed Systems

Distributed Systems Course Distributed File Systems

Distributed Systems: Coordination models and languages

Type Systems for Programming Languages

Distributed Systems for Information Systems Management

Distributed Systems