1 / 32

Distributed Operating Systems

Distributed Operating Systems. Coverage. Distributed Systems (DS) Paradigms DS & OS’s Services and Models Communication Distributed File Systems Coordination Distributed Mutual Exclusion (ME) Distributed Co-ordination Synchronization DS Scheduling & Misc. Issues.

loc
Download Presentation

Distributed Operating Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Operating Systems

  2. Coverage • Distributed Systems (DS) Paradigms • DS & OS’s • Services and Models • Communication • Distributed File Systems • Coordination • Distributed Mutual Exclusion (ME) • Distributed Co-ordination • Synchronization • DS Scheduling & Misc. Issues

  3. What is a Distributed System “A distributed system is the one preventing you from working because of the failure of a machine that you had never heard of.” Leslie Lamport “A distributed system is a collection of independent computers that appear to the users of the system as a single computer.” Tanenbaum shared memory multiprocessor message passing multiprocessor distributed system  Multiple computers sharing (same) state and interconnected by a network

  4. Distribution: Example Pro/Cons All the Good Stuff:High Performance, Distributed Access, Scalable, Heterogeneous, Sharing (Concurrency), Load Balancing (Migration, Relocation), Fault Tolerance , … • Bank account database (DB) example • Naturally centralized: easy consistency and performance • Fragment DB among regions: exploit locality of reference, security & reduce reliance on network for remote access • Replicate each fragment for fault tolerance • But, we now need (additional) DS techniques • Route request to right fragment • Maintain access/consistency of fragments as a whole database • Maintain access/consistency of each fragment’s replicas • …

  5. Transparency: Global Access Fragmentation Hide wether the resource is fragmented or not • Illusion of a single computer across a DS • Distribution transparency: • Access transparency + location transparency + replication transparency + fragmentation transparency

  6. Multiprocessor OS Types (1) • Each CPU has its own operating system • Shared bus  Comm. blocking & CPU idling! Bus

  7. Multiprocessor OS Types (2) Master-Slave multiprocessors Bus • Master-Slave multiprocessors • Master is a bottleneck!

  8. Multiprocessor OS Types (3) • Symmetric Multiprocessors • SMP multiprocessor model Bus • Eliminates the CPU bottleneck, but have issues associated to ME, synchronization • Mutex on OS?

  9. OS’s for DS’s • Loosely-coupled OS • A collection of computers each running their own OS, OS’s allow sharing of resources across machines • A.K.A. Network Operating System (NOS) • Manages heterogeneous multicomputer DS • Difference: provides local services to remote clients via remote logging • Data transfer from remote OS to local OS via FTP (File Transfer Protocols) • Tightly-coupled OS • OS tries to maintain single global view of resources it manages • A.K.A. Distributed Operating System (DOS) • Manages multiprocessors & homogeneous multicomputers • Similar “local access feel” as a non-distributed, standalone OS • Data migration or computation migration modes (entire process or threads)

  10. Network Operating Systems (NOS)

  11. Distributed Operating Systems (DOS)

  12. Client Server Model for DOS & NOS

  13. Middleware • Can we have the best of both worlds? • Scalability and openness of a NOS • Transparency and relative ease of a DOS • Solution: additional layer of SW above NOS • Mask heterogeneity • Improve distribution transparency (and others)  “Middleware” (MW)

  14. Middleware (& Openness) • Document-based middleware (e.g. WWW) • Coordination-based MW (e.g., Linda, publish subscribe, Jini etc.) • File system based MW • Shared object based MW

  15. File System-Based Middleware (1) • Approach: make a DS look like a big file system • Transfer models: • Download/upload model (work done locally) • Remote access model (work done remotely) (b) (a)

  16. File System-Based Middleware (2) (a) Two file systems (b) Naming Transparency: All clients have same view of FS (c) Some clients with different FS views

  17. File System-Based Middleware (3) • Semantics of file sharing (ordering and session semantics) • (a) single processor gives sequential consistency • (b) distributed system may return obsolete value

  18. Shared Object-Based Middleware (1) • Approach: make a DS look like objects (variables + methods) • Easy scaling to large systems • replicated objects (C++, Java) • flexibility • Main elements of CORBA based system • Common Object Request Broker Architecture Example 1: Main elements of CORBA (Common Object Request Broker Architecture) based system: - Object Request Broker (ORB) - Internet Inter-ORB Protocol (IIOP) inter-ORB protocol

  19. Shared Object-Based Middleware (2) • Example 2: Globe [1] • provides a model of distributed shared objects and basic support services (naming, locating objects etc.) • an object is completely self-contained • designed to scale to a billion users, a trillion objects around the world GIDS: Globe Infrastructure Directory Service [1] M. van Steen, P. Homburg, and A. Tanenbaum. “Globe: A Wide-Area Distributed System”, 1999.

  20. Shared Object-Based Middleware (3) A distributed shared object in Globe • can have its state copied on multiple computers at once • how to maintain sequential consistency of write operations?

  21. OS’s, DS’s & MW

  22. Network Hardware The Internet

  23. Network Services and Protocols • Network services: Network Services (blocking) (non-blocking) • • Internet Protocol (IP) • • Transmission Control Protocol (TCP): Connection-oriented • Universal Datagram Protocol (UDP): Connectionless

  24. Client-Server Communications Unbuffered msg. passing send(addr,msg), recv(addr,msg) all request and reply at C/S level all msg. acks between kernels only Buffered msg. passing msg. sent to kernel mailbox or kernel/user interface socket server client kernel kernel msg. directed at a process server client kernel kernel blocking? non-blocking?

  25. Remote Procedure Calls (RPC) • Synchronous/Asynchronous (blocking/non-blocking) communication • [Sync] client generated request, STUB  kernel • [Sync] kernel blocks process till reply received from server • [ASync] buffers msg

  26. RPC & Stubs (Dummy Procedure i.p.o RPC) [C] call “client stub” procedure [CS] prepare msg. buffer [CS] load parameters into buffer [CS] prepare msg. header [CS] send trap to kernel [K] context switch to kernel [K] copy msg. to kernel [K] determine server address (NS) [K] put address in header [K] set up network interface [K] start timer for msg [S] process req; initiate “server stub” [SS] call server [SS] set up parameter stack/unbundle [K] context switch to server stub [K] copy msg. to stub [K] see if stub is waiting [K] decide which stub to assign [K] check packet for validity [K] process interrupt (save PC, kernel state) NS C: Client; CS:Client Stub, [K] Kernel S: Server; SS: Server Stub, NS : Network service

  27. Remote Procedure Call Implementation Issues: • Can we pass pointers? (local context…) • call by reference becomes copy-restore (but might fail) • Weakly typed languages (e.g., C allows computations, say product of arrays sans array size specs) • can client stub determine unspecified size to pass on? • Not always possible to determine parameter types • Cannot use global variables • C/S may get moved to remote machine

  28. RPC Failures? • C/S failure vs. communication failure? • Who detects? Timeouts? • Does it matter if a node (C/S) failed • BEFORE or AFTER a request arrived? • BEFORE or AFTER a request is processed? • Client failure: orphan requests? add expiration counters • Server crash?

  29. Communication • Delivers messages despite • communication link(s) failure • process failures • Main kinds of failures to tolerate • timing (link and process) • omission (link and process) • value

  30. Communication: Reliable Delivery • Omission failure tolerance (degree k). • Design choices: • Error masking (spatial): several (> k) links • Error masking (temporal): repeat K+1 times • Error recovery: detect error and recover

  31. Reliable Delivery (cont.) Error detection and recovery: ACK’s and timeouts • Positive ACK: sent when a message is received • Timeout on sender without ACK: sender retransmits • Negative ACK: sent when a message loss detected • Needs sequence #s or time-based reception semantics • Tradeoffs • Positive ACKs faster failure detection usually • NACKs : fewer msgs… Q: what kind of situations are good for • Spatial error masking? • Temporal error masking? • Error detection and recovery with positive ACKs? • Error detection and recovery with NACKs?

  32. Resilience to Sender Failure • Multicast FT-Communication harder than point-to-point • Basic problem is of failure detection • Subsets of senders may receive msg, then sender fails • Solutions depend on flavor of multicast reliability • Unreliable: no effort to overcome link failures • Best-effort: some steps taken to overcome link failures • Reliable: participants coordinate to ensure that all or none of correct recipients get it (sender failed in b)

More Related