1 / 52

Distributed Operating Systems

Distributed Operating Systems. Luai M. Malhis Computer Engineering Department Most of the course materials are taken from http://lass.cs.umass.edu/~shenoy/courses/677. Course Outline. Introduction (today) What, why, why not? Basics Interprocess Communication

Download Presentation

Distributed Operating Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Operating Systems Luai M. Malhis Computer Engineering Department Most of the course materials are taken from http://lass.cs.umass.edu/~shenoy/courses/677 CS677: Distributed OS

  2. Course Outline • Introduction (today) • What, why, why not? • Basics • Interprocess Communication • RPCs, RMI, message- and stream-oriented communication • Processes and their scheduling • Thread/process scheduling, code/process migration • Naming and location management • Entities, addresses, access points CS677: Distributed OS

  3. Course Outline • Canonical problems and solutions • Mutual exclusion, leader election, clock synchronization, … • Resource sharing, replication and consistency • DSM, DFS, consistency issues, caching and replication • Fault-tolerance • Security in distributed Systems • Distributed middleware • Advanced topics: web, multimedia, real-time and mobile systems CS677: Distributed OS

  4. Misc. Course Details • Textbook: Distributed Systems by Tannenbaum and Van Steen, Prentice Hall 2001 • Grading • 4-5 Homeworks (20%), 3-4 programming assignments (35%) • 1 mid-term and 1 final (40%), class participation (5%) • Course mailing list: cs677@cs.umass.edu • You need to add yourself to this list! [ see class web page ] • Pre-requisites • Undergrad course in operating systems • Good programming skills in a high-level prog. language CS677: Distributed OS

  5. Definition of a Distributed System • A distributed system: • Multiple connected CPUs working together • A collection of independent computers that appears to its users as a single coherent system • Examples: parallel machines, networked machines CS677: Distributed OS

  6. Advantages and Disadvantages • Advantages • Communication and resource sharing possible • Economics – price-performace ratio • Relibility, scalability • Potential for incremental growth • Disadvantages • Distribution-aware PLs, OSs and applications • Network connectivity essential • Security and privacy CS677: Distributed OS

  7. Transparency in a Distributed System Different forms of transparency in a distributed system. CS677: Distributed OS

  8. Scalability Problems Examples of scalability limitations. CS677: Distributed OS

  9. Hardware Concepts: Multiprocessors (1) • Multiprocessor dimensions • Memory: could be shared or be private to each CPU • Interconnect: could be shared (bus-based) or switched • A bus-based multiprocessor. CS677: Distributed OS

  10. Multiprocessors (2) • A crossbar switch b) An omega switching network 1.8 CS677: Distributed OS

  11. Homogeneous Multicomputer Systems • Grid b) Hypercube 1-9 CS677: Distributed OS

  12. Distributed Systems Models • Minicomputer model (e.g., early networks) • Each user has local machine • Local processing but can fetch remote data (files, databases) • Workstation model (e.g., Sprite) • Processing can also migrate • Client-server Model (e.g., V system, world wide web) • User has local workstation • Powerful workstations serve as servers (file, print, DB servers) • Processor pool model (e.g., Amoeba, Plan 9) • Terminals are Xterms or diskless terminals • Pool of backend processors handle processing CS677: Distributed OS

  13. Uniprocessor Operating Systems • An OS acts as a resource manager or an arbitrator • Manages CPU, I/O devices, memory • OS provides a virtual interface that is easier to use than hardware • Structure of uniprocessor operating systems • Monolithic (e.g., MS-DOS, early UNIX) • One large kernel that handles everything • Layered design • Functionality is decomposed into N layers • Each layer uses services of layer N-1 and implements new service(s) for layer N+1 CS677: Distributed OS

  14. Uniprocessor Operating Systems • Microkernel architecture • Small kernel • user-level servers implement additional functionality CS677: Distributed OS

  15. Distributed Operating System • Manages resources in a distributed system • Seamlessly and transparently to the user • Looks to the user like a centralized OS • But operates on multiple independent CPUs • Provides transparency • Location, migration, concurrency, replication,… • Presents users with a virtual uniprocessor CS677: Distributed OS

  16. Types of Distributed OSs CS677: Distributed OS

  17. Multiprocessor Operating Systems • Like a uniprocessor operating system • Manages multiple CPUs transparently to the user • Each processor has its own hardware cache • Maintain consistency of cached data CS677: Distributed OS

  18. Multicomputer Operating Systems 1.14 CS677: Distributed OS

  19. Network Operating System 1-19 CS677: Distributed OS

  20. Network Operating System • Employs a client-server model • Minimal OS kernel • Additional functionality as user processes 1-20 CS677: Distributed OS

  21. Middleware-based Systems • General structure of a distributed system as middleware. 1-22 CS677: Distributed OS

  22. Comparison between Systems CS677: Distributed OS

  23. Communication in Distributed Systems • Issues in communication (today) • Message-oriented Communication • Remote Procedure Calls • Transparency but poor for passing references • Remote Method Invocation • RMIs are essentially RPCs but specific to remote objects • System wide references passed as parameters • Stream-oriented Communication CS677: Distributed OS

  24. Communication Between Processes • Unstructured communication • Use shared memory or shared data structures • Structured communication • Use explicit messages (IPCs) • Distributed Systems: both need low-level communication support (why?) CS677: Distributed OS

  25. Communication Protocols • Protocols are agreements/rules on communication • Protocols could be connection-oriented or connectionless 2-1 CS677: Distributed OS

  26. Layered Protocols • A typical message as it appears on the network. 2-2 CS677: Distributed OS

  27. Client-Server TCP • Normal operation of TCP. • Transactional TCP. 2-4 CS677: Distributed OS

  28. Middleware Protocols • Middleware: layer that resides between an OS and an application • May implement general-purpose protocols that warrant their own layers • Example: distributed commit 2-5 CS677: Distributed OS

  29. Client-Server Communication Model kernel kernel kernel kernel • Structure: group of servers offering service to clients • Based on a request/response paradigm • Techniques: • Socket, remote procedure calls (RPC), Remote Method Invocation (RMI) file server process server terminal server client CS677: Distributed OS

  30. Issues in Client-Server Communication • Addressing • Blocking versus non-blocking • Buffered versus unbuffered • Reliable versus unreliable • Server architecture: concurrent versus sequential • Scalability CS677: Distributed OS

  31. Addressing Issues user user user server server server NS • Question: how is the server located? • Hard-wired address • Machine address and process address are known a priori • Broadcast-based • Server chooses address from a sparse address space • Client broadcasts request • Can cache response for future • Locate address via name server CS677: Distributed OS

  32. Blocking versus Non-blocking • Blocking communication (synchronous) • Send blocks until message is actually sent • Receive blocks until message is actually received • Non-blocking communication (asynchronous) • Send returns immediately • Return does not block either • Examples: CS677: Distributed OS

  33. Buffering Issues user user server server • Unbuffered communication • Server must call receive before client can call send • Buffered communication • Client send to a mailbox • Server receives from a mailbox CS677: Distributed OS

  34. Reliability request • Unreliable channel • Need acknowledgements (ACKs) • Applications handle ACKs • ACKs for both request and reply • Reliable channel • Reply acts as ACK for request • Explicit ACK for response • Reliable communication on unreliable channels • Transport protocol handles lost messages ACK Server User reply ACK request Server User reply ACK CS677: Distributed OS

  35. Server Architecture • Sequential • Serve one request at a time • Can service multiple requests by employing events and asynchronous communication • Concurrent • Server spawns a process or thread to service each request • Can also use a pre-spawned pool of threads/processes (apache) • Thus servers could be • Pure-sequential, event-based, thread-based, process-based • Discussion: which architecture is most efficient? CS677: Distributed OS

  36. Scalability • Question:How can you scale the server capacity? • Buy bigger machine! • Replicate • Distribute data and/or algorithms • Ship code instead of data • Cache CS677: Distributed OS

  37. To Push or Pull ? • Client-pull architecture • Clients pull data from servers (by sending requests) • Example: HTTP • Pro: stateless servers, failures are each to handle • Con: limited scalability • Server-push architecture • Servers push data to client • Example: video streaming, stock tickers • Pro: more scalable, Con: stateful servers, less resilient to failure • When/how-often to push or pull? CS677: Distributed OS

  38. Remote Procedure Calls • Goal: Make distributed computing look like centralized computing • Allow remote services to be called as procedures • Transparency with regard to location, implementation, language • Issues • How to pass parameters • Bindings • Semantics in face of errors • Two classes: integrated into prog, language and separate CS677: Distributed OS

  39. Conventional Procedure Call Parameter passing in a local procedure call: the stack before the call to read b) The stack while the called procedure is active CS677: Distributed OS

  40. Parameter Passing • Local procedure parameter passing • Call-by-value • Call-by-reference: arrays, complex data structures • Remote procedure calls simulate this through: • Stubs – proxies • Flattening – marshalling • Related issue: global variables are not allowed in RPCs CS677: Distributed OS

  41. Client and Server Stubs • Principle of RPC between a client and server program. CS677: Distributed OS

  42. Stubs • Client makes procedure call (just like a local procedure call) to the client stub • Server is written as a standard procedure • Stubs take care of packaging arguments and sending messages • Packaging is called marshalling • Stub compiler generates stub automatically from specs in an Interface Definition Language (IDL) • Simplifies programmer task CS677: Distributed OS

  43. Steps of a Remote Procedure Call • Client procedure calls client stub in normal way • Client stub builds message, calls local OS • Client's OS sends message to remote OS • Remote OS gives message to server stub • Server stub unpacks parameters, calls server • Server does work, returns result to the stub • Server stub packs it in message, calls local OS • Server's OS sends message to client's OS • Client's OS gives message to client stub • Stub unpacks result, returns to client CS677: Distributed OS

  44. Example of an RPC 2-8 CS677: Distributed OS

  45. Marshalling • Problem: different machines have different data formats • Intel: little endian, SPARC: big endian • Solution: use a standard representation • Example: external data representation (XDR) • Problem: how do we pass pointers? • If it points to a well-defined data structure, pass a copy and the server stub passes a pointer to the local copy • What about data structures containing pointers? • Prohibit • Chase pointers over network • Marshalling: transform parameters/results into a byte stream CS677: Distributed OS

  46. Binding • Problem: how does a client locate a server? • Use Bindings • Server • Export server interface during initialization • Send name, version no, unique identifier, handle (address) to binder • Client • First RPC: send message to binder to import server interface • Binder: check to see if server has exported interface • Return handle and unique identifier to client CS677: Distributed OS

  47. Failure Semantics • Client unable to locate server: return error • Lost request messages: simple timeout mechanisms • Lost replies: timeout mechanisms • Make operation idempotent • Use sequence numbers, mark retransmissions • Server failures: did failure occur before or after operation? • At least once semantics (SUNRPC) • At most once • No guarantee • Exactly once: desirable but difficult to achieve CS677: Distributed OS

  48. Failure Semantics • Client failure: what happens to the server computation? • Referred to as an orphan • Extermination: log at client stub and explicitly kill orphans • Overhead of maintaining disk logs • Reincarnation: Divide time into epochs between failures and delete computations from old epochs • Gentle reincarnation: upon a new epoch broadcast, try to locate owner first (delete only if no owner) • Expiration: give each RPC a fixed quantum T; explicitly request extensions • Periodic checks with client during long computations CS677: Distributed OS

  49. Implementation Issues • Choice of protocol [affects communication costs] • Use existing protocol (UDP) or design from scratch • Packet size restrictions • Reliability in case of multiple packet messages • Flow control • Copying costs are dominant overheads • Need at least 2 copies per message • From client to NIC and from server NIC to server • As many as 7 copies • Stack in stub – message buffer in stub – kernel – NIC – medium – NIC – kernel – stub – server • Scatter-gather operations can reduce overheads CS677: Distributed OS

  50. Case Study: SUNRPC • One of the most widely used RPC systems • Developed for use with NFS • Built on top of UDP or TCP • TCP: stream is divided into records • UDP: max packet size < 8912 bytes • UDP: timeout plus limited number of retransmissions • TCP: return error if connection is terminated by server • Multiple arguments marshaled into a single structure • At-least-once semantics if reply received, at-least-zero semantics if no reply. With UDP tries at-most-once • Use SUN’s eXternal Data Representation (XDR) • Big endian order for 32 bit integers, handle arbitrarily large data structures CS677: Distributed OS

More Related