1 / 67

CS 603 Review

CS 603 Review. April 24, 2002. Seminar Announcements. Saurabh Bagchi, “Hierarchical Error Detection in a Distributed Software Implemented Fault Tolerance (SIFT) Environment” April 25, 10:30-11:30, MSEE 239 Fabian E. Bustamante, “The Active Streams Approach to Adaptive Distributed Systems

lethia
Download Presentation

CS 603 Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 603Review April 24, 2002

  2. Seminar Announcements • Saurabh Bagchi, “Hierarchical Error Detection in a Distributed Software Implemented Fault Tolerance (SIFT) Environment” • April 25, 10:30-11:30, MSEE 239 • Fabian E. Bustamante, “The Active Streams Approach to Adaptive Distributed Systems • April 29, 10:30-11:30, CS 101

  3. Review • Why do we want distributed systems? • Scaling • Heterogeneity • Geographic Distribution • What is a distributed system? • Transparency vs. Exposing Distribution • Hardware Basics • Communication Mechanisms

  4. Basic Software Concepts • Hiding vs. Exposing • Distribution – Distributed OS • Location, but not distribution – Middleware • None – Network OS • Concurrency Primitives • Semaphores • Monitors • Distributed System Models • Client-Server • Multi-Tier • Peer to Peer

  5. Communication Mechanisms • Shared Memory • Enforcement of single-system view • Delayed consistency: δ-Common Storage • Message Passing • Reliability and its limits • Stream-oriented Communications • Remote Procedure Call • Remote Method Invocation

  6. RPC Mechanisms • DCE • Language / Platform Independent • Implementation Issues: • Data Conversion • Underlying Mechanisms • Fault Tolerance Approaches • Java RMI • SOAP • Interoperable • Language independent • Transport independent (anything that moves XML)

  7. Naming Requirements • Disambiguate only • Access resource given the name • Build a name to find a resource • Do humans need to use name? • Static/Dynamic Resource • Performance Requirements

  8. Registry Example: X.500 • Goal: Global “white pages” • Lookup anyone, anywhere • Developed by Telecommunications Industry • ISO standard directory for OSI networks • Idea: Distributed Directory • Application uses Directory User Agent to access a Directory Access Point • Basis for LDAP, ActiveDirectory

  9. Directory Information Base(X.501) • Tree structure • Root is entire directory • Levels are “groups” • Country • Organization • Individual • Entry structure • Unique name • Build from tree • Attributes: Type/value pairs • Schema enforces type rules • Alias entries

  10. X.500 • Directory Entry: • Organization level – CN=Purdue University, L=West Lafayette • Person level – CN=Chris Clifton, SN=Clifton, TITLE=Associate Professor • Directory Operations • Query, Modify • Authorization / Access control • To directory • Directory as mechanism to implement for others

  11. X.500 – Distributed Directory • Directory System Agent • Referrals • Replication • Cache vs. Shadow copy • Access control • Modifications at Master only • Consistency • Each entry must be internally consistent • DSA giving copy must identify as copy

  12. Clock Synchronization • Definition: All nodes agree on time • What do we mean by time? • What do we mean by agree? • Lamport Definition: Events • Events partially ordered • Clock “counts” the order

  13. Event-based definition(Lamport ’78) Define partial order of processes • A  B: A “happened before” B: Smallest relation such that: • If A and B in same process and A occurs first, A  B • If A is sending a message and B is receipt of a message, A  B • If A  B and B  C, then A  C • Clock: C(x) is time x occurs: • C(x) = Ci(x) where x running on node i. • Clocks correct if  a,b: ab  C(a) < C(b)

  14. Lamport Clock Implementation • Node i Increments Ci between any two successive events • If event a is sending of a message m from i to j, • m contains timestamp Tm = Ci(a) • Upon receiving m, set Cj≥ current Cj and > Tm • Can now define total ordering. a  b iff: • Ci(a) < Cj(b) • Ci(a) = Cj(b) and Pi < Pj

  15. What if we want “wall clock” time? • Ci must run at correct rate: • κ << 1 such that | dCi(t)/dt – 1 | < κ • Synchronized: •  small ε such that  i,j: | Ci(t) – Cj(t) | < ε • Assume transmission time between μ and μ+ξ • Algorithm: Upon receiving message m,set Cj(t) = max(Cj(t), Tm+μ) • Theorem: Assume every τ seconds a message with unpredictable delay ξ is sent over every arc. Then t ≥ t0 + τd, ε≈ d(2κτ + ξ)

  16. Clock Synchronization:Limits • Best Possible: Delay Uncertainty • Actually ε(1 – 1/n) • Synchronization with Faults • Faulty clock • Communication Failure • Malicious processor • Worst case: Can only synchronize if < 1/3 processors faulty • Better if clocks can be authenticated

  17. Process Synchronization • Problem: Shared Resources • Model as sequential or parallel process • Assumes global state! • Alternative: Mutual Exclusion when Needed • Coordinator approach • Token Passing • Timestamp

  18. Mutual Exclusion • Requirements • Does it guarantee mutual exclusion? • Does it prevent starvation? • Is it fair? • Does it scale? • Does it handle failures?

  19. Mutual Exclusion:Colored Ticket Algorithm • Goals: • Decentralized • Fair • Fault tolerant • Space Efficient • Idea: Numbered Tickets • Next number gets resource • Problem: Unbounded Space • Solution: Reissue blocks

  20. Multi-ResourceMutual Exclusion • New Problem: Deadlock • Processes using all resources • Each needs additional resource to proceed • Dining Philosophers Problem • Coordinated vs. truly distributed solutions • Problems with deterministic solutions • Probabilistic solution – Lehman & Rabin • Starvation / fairness properties

  21. Distributed Transactions • ACID properties • Issues: • Commit Protocols • Fault Tolerance Why is this enough? • Failure Models and Limitations • Mechanisms: • Two-phase commit • Three-phase commit

  22. Two-Phase Commit(Lamport ’76, Gray ’79) • Central coordinator initiates protocol • Phase 1: • Coordinator asks if participants can commit • Participants respond yes/no • Phase 2: • If all votes yes, coordinator sends Commit • Participants respond when done • Blocks on failure • Participants must replace coordinator • If participant and coordinator fail, wait for recovery • While blocked, transaction must remain Isolated • Prevents other transactions from completing

  23. Transaction Model • Transaction Model • Global Transaction State • Reachable State Graph • Local states potentially concurrent if a reachable global state contains both local states • Concurrency set C(s) is all states potentially concurrent with s • Sender set S(s) = {local states t | t sends m and s can receive m} • Failure Model • Site failure assumed when expected message not received in time • Independent Recovery

  24. Problems with 2-PC • Blocking on failure • 3-PC as solution • Theorems on recovery limits • Independent recovery: No two-site failure • Non-independent recovery • Anything short of total failure okay • Recovery protocol for total failure

  25. Data Replication • Fault Tolerance • Hot backup • Catastrophic failure • Performance • Parallelism • Decreased reliance on network • Correctness criterion: Replication invisible • One-copy serializability (1SR)

  26. Data Replication: How? • Goal: Ensure one-copy serializability • Write-all solution: All copies identical • Write goes to every site • Read from any site • Standard single-copy concurrency control • Guarantees 1SR • Single-copy concurrency control gives serializable execution • Equivalent to serial execution where all writes happen in one transaction

  27. Problem: Site Failure • Failure causes write to block • Must maintain locks • Clogs up entire system Is this fault tolerance? • What about “write all available”? • T0: w0[xA] w0[xB] w0[yC] c0 • B-fails • T1: r1[yC] w1[xA] c1 • B-recovers • T2: r2[xB] w2[yC] c2 • What is the serial equivalent order?

  28. Write All Available FailsEven if no recovery!

  29. Solutions • Validate availability on commit • Check if any failed writes now available • Check that all sites read or written still available • Enforces serializability for site failures Doesn’t work with communication failures!

  30. Formalisms for Relaxed consistency • Goal: Relaxed consistency constraints • Meet application needs • Outperform true transparent replication • How do we ensure constraints meet needs? • Formalisms to describe application needs • Methods to prove constraints adequate

  31. Quasi-Copies(Alonso, Barbará, Garcia-Molina ’90) • Data Caching • Each site keeps copy of data likely to be used locally • Propagation cost of writes high • User-Defined Cache • Controlled Divergence • Weak consistency constraints • Bounds on the differences between copies • User defines constraints

  32. Assumptions • Read-only copies • Updates sent to master copy • E.g., ORACLE Materialized View • User Specified Coherency • Strict limits • “Hints” • Example: Stock Purchase • Place order based on delayed price • Limit order to ensure price paid okay

  33. Selection Conditions • Identification clause • Select/Project Query • Modifier Clause • Add / drop from cache • Compulsory or advisory cache • Static / Dynamic: As new objects meet the identification clause, are they cached? • Triggering delay on dynamic

  34. Coherency Conditions • Default (always enforced): Value was true once • Delay W(x,α): Max time lag • Version V(x): Number of updates • Periodic P(x): Time for refresh • Arithmetic A(x): Bounded Difference • Combine conditions with logical operators • Multi-object conditions • Consistency conditions on a group • Order of application in a group

  35. CS 603Review April 26, 2002

  36. Remote Operation Mechanisms • Client-Server Model: • Remote Procedure Call Problem: Remote Site must already know what we want to do! • Process consists of: • Code • Resources (files, devices, etc.) • Execution (data, stack, registers, etc.) • Fork copies everything • Is this needed? • Solution: Copy part of the process

  37. So where are we? • Models for Remote Processing • Server: Request documented service • RPC: Request execution of existing procedure • What if operation we want isn’t offered remotely? • Solution: Agents / Code Migration

  38. Types of Code Migration From Andrew Tanenbaum, Distributed Operating Systems, 1995.

  39. Resource Binding

  40. DCOM – What is it? • Start with COM – Component Object Model • Language-independent object interface • Add interprocess communication

  41. DCOM:Distributed COM • Looks like COM to the client • Built on DCE RPC • Extends to support full COM functionality

  42. DCOM Architecture

  43. Locating Objects:Activation • CoCreateInstance(Ex)(<CLSID>) • Interface pointer to uninitialized instance • Same as COM • CoiGetInstanceFromFile, FromStorage • Create new instance • CoGetClassObject(<CLSID>) • Factory object that creates objects of <CLSID> • CoGetClassObjectFromURL • Downloads necessary code from URL and instantiates • Can take server name as parameter • Or default to server specified in DCOM configuration on client machine [HKEY_CLASSES_ROOT\APPID\{<appid-guid>}] "RemoteServerName"="<DNS name>“ • Also store information in ActiveDirectory

  44. CORBA Single interface name Multiple inheritance Dynamic Invocation Interface C++-style Exception Handling Explicit and Implicit reference counts Implemented by ORB with replaceable services DCOM Distinction between Class and Instance Identifier Implement multiple interfaces Type libraries for on-demand marshaling 32 Bit Error Code Explicit reference count only Implemented by many independent services DCOM vs. CORBA

  45. What is .NET? • Language for distributed computation • C#, VB.NET, JScript • Protocols • SOAP, HTTP • Run-time environment • Common Language Runtime (CLR) • ActiveDirectory • Web Servers (ASP.NET)

  46. DCOM IDL Name, Monikers Registry / ActiveDirectory C++, Visual Basic DCE RPC DCOM Network protocol (based on DCE standards) .NET Web Services Description Language (WSDL) DISCO (URI grammar) Universal Description Discovery and Integration (UDDI) C#, VB.NET SOAP HTTP (presumed ubiquitous), SMTP (!?) COM/DCOM  .NET

  47. How .NET works • Query UDDI directory to get service location • Query service to get WSDL (interface specification) • Build call (XML) based on WSDL spec. • Make call using SOAP • Parse XML results based on WSDL spec.

  48. Jini:Java Middleware • Tools to construct federation • Multiple devices, each with Java Virtual Machine • Multiple services • Uses (doesn’t replace) Java RMI • Adds infrastructure to support distribution • Registration • Lookup • Security

  49. Service • Basic “unit” of JINI system • Members provide services • Federate to share access to services • Services combined to accomplish tasks • Communicate using service protocol • Initial set defined • Add more on the fly

  50. Infrastructure:Key Components • RMI • Basic communication model • Distributed Security System • Integrated with RMI • Extends JVM security model • Discovery/join protocol • How to register and advertise services • Lookup services • Returns object implementing service (really a local proxy)

More Related