1 / 38

Communication and Data Sharing for Dynamic Distributed Systems

Communication and Data Sharing for Dynamic Distributed Systems. Nancy Lynch MIT. Alex Shvartsman UConn. Motivation and Focus. Constructing distributed applications for highly dynamic environments is a difficult In practice, considerable effort is required to make applications resilient to

turi
Download Presentation

Communication and Data Sharing for Dynamic Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Communication and Data Sharing for Dynamic Distributed Systems Nancy LynchMIT Alex ShvartsmanUConn

  2. Motivation and Focus • Constructing distributed applications for highly dynamic environments is a difficult • In practice, considerable effort is required to make applications resilient to • changes in client requirements • evolution of the underlying computing medium • Focus of our work • design and analysis of distributed services • that provide useful guarantees and • that make the construction of sophisticated distributed applications easier.

  3. Our Approach • Traditionally • research on distributed services emphasized specification and correctness, while • research on distributed algorithms emphasized complexity and performance • We combine these concerns leading to • algorithms that perform efficiently and degrade gracefully in dynamic distributed settings, and • whose correctness, performance, and fault-tolerance guarantees are expressed by precisely-defined global services.

  4. Research Direction Summary • Develop and analyze algorithms to solve problems of communication and data sharing in highly dynamic distributed environments • “Dynamic” encompasses • Changes in network topology • Processor mobility • Changing sets of participants • Wide range of failures • Timing variations

  5. Research Direction (cont’d) • The properties we study include • ordering and reliability guarantees for communication • coherence guarantees for data sharing • The algorithmic results will be accompanied by • lower bound and impossibility results, • which describe inherent limitations on what problems can be solved, and at what cost.

  6. RAMBOReconfigurable Atomic Memoryfor Read/Write ObjectsNancy LynchAlex Shvartsman

  7. Design Goals • RAMBO • Reconfigurable Atomic Memory for Basic Objects (Read/Write) for message-passing systems • Dynamic replication for availability and survivability • Loosely-coupled on-the-fly reconfiguration • High concurrency • Low latency • Safety for any patterns of asynchrony and failures • Good performance under partial asynchrony and for moderate failures

  8. Algorithmic Ideas • Reconfigurable quorum systems • Quorums maintain consistency during modest and transient changes • Reconfigurations accommodate more drastic and permanent changes • Read/write operations are frequent • Use quorum access and allow concurrency • Isolate from reconfiguration • Reconfigurations are infrequent • Use consensus to impose total order (Paxos) • Optimistic dissemination without formal installation • Conservative garbage collection of obsolete config-s

  9. Related Prior Work • Atomic read/write memory in message-passing models • Upfal Widgerson 86 • Attiya Bar-Noy Dolev 91, 95 • Lynch Shvartsman 97 • Englert Shvartsman 01 • Paxo • Lamport 89, 98 • Quorums • Gifford 79, Thomas 79 • and many many others

  10. Methodology • Specify algorithm • Interacting state machines • Using non-deterministic “gossip” • Show correctness/safety for • arbitrary patterns of asynchrony • assuming arbitrary crash-failures and message loss • Analyze performance for a subset of timed executions • Bounded message delay, 0-time local processing • Some “gossip” becomes deliberate, some periodic • Non-failure of certain quorums for certain periods • Reason about operation latency • (Of course none of this impacts safety)

  11. Showing Read/Write Atomicity • We show atomicity using a partial order • Atomicity of a sequence  of reads/writes • Let  be an irreflexive PO of all op-s in . Show: • For any , finitely many  • If  precedes , then not • If  is write then either  or  • Any read returns value written by last write, per [Lynch, Lemma 13.16]

  12. Approach: Values and Tags • Each value v has an associated tag t • Tag is made up of the sequence-processor pair • Reads: • a set of value-tag pairs is obtained • the result is the value with the maximum tag • Writes: • a set of value-tag pairs is obtained • new-value is propagated with a new-tag that is a lexicographic increment of tag :new-tag := tag.seq + 1, pid 

  13. Using Quorum Systems • Given a set I (a set of processor ids) • A quorum system is a pair • < read-quorums, write-quorums > • Where • Read-quorums is a collection of subsets of I • Write-quorums is a collection of subsets of I • Such that • For any R in read-quorums and W in write-quorums, R W   • For any W1 and W2 in write-quorums, W1 W2 

  14. High-Level Functions • Joiner • Introduces new participants to the system • Reader-Writer • Routine read and write operations • Two-phased algorithm using all “known” configurations • Using tags • Reconfiguration • Chooses new next configuration • Informs members of the previous configuration • Garbage collection (“packaged” with Reader-Writer) • Identify and remove obsolete configurations

  15. RAMBO System RAMBO Joiner Reader-Writer Recon Cons Network

  16. Architectural View • Each component is formally specified • Input/Output Automata [Tuttle Lynch] • Joiners are specified as Joineri for i in I • Reader-Writers are Reader-Writeri for i in I • Reconfigurers are Reconi for i in I • Consensus instances are Cons(k,c) for i in N, c in C • Where the members of configuration c decide on the configuration number k • Network is specified in terms of Channeli,j for i, j in I • Assumed only to be “honest” • The System is then the composition of all automata

  17. Configurations and Config Maps • Configuration c • members(c) -- set of members of configuration c • read-quorums(c) -- set of read quorums • write-quorums(c) -- set of write quorums • Configuration map cm • mapping from naturals to configurations • cm(k) is the configuration k, and it can be • defined, undefined (), garbage-collected (±) . . . . . . ± ± c c c  c   G-C-ed Defined “Mixed” Undefined

  18. Configuration Maps . . . c0           TIME . . . c0 c1          . . . c0 c1 c2    ck     . . . ± c1 c2    ck     . . . ± ± c2    ck     . . . ± ± ± c3   ck     . . . . . . ± ± ± ± ± c c c  c 

  19. Reader-Writer Protocol • One “gossip” message • < World, value, tag, cmap, ns, nr > • Message from a sender s to a receiver r is such that • World is s ’s set of participants, and r World • value and tag are the object value and its tag at s • cmap is the configuration map at s • ns and nr are sender’s and best known receiver’s phase numbers used to identify “fresh” messages • These messages are • Sent non-deterministically • For performance analysis we impose an additional deterministic send policy • Certain actions are taken when “enough” info is gathered

  20. RAMBOj RAMBOn Reader-Writerj Reader-Writern Reconj Reconn Read/Write Protocol write(v)i readi RAMBOi Reader-Writeri read-ack(v)i write-acki new-config(c,k)i Reconi gossip gossip gossip . . .

  21. Reader-Writer Code Send Query fix Receive Prop fix New cfg End read Start read End write Start write

  22. The Phase Pattern • Send to a collection of processes in “known” configs • Collect responses and update configuration information • Continue until a certain predicate is satisfied End Start Continue sending no yes Send Collect responses Fixpoint reached? Recv Send

  23. Read and Write Operations • Reads and Writes use Query and Propagation phases involving known quorum configurations • Query obtains information about “latest” operations from read quorums & updates configurations • Propagation disseminates the results of “latest” operation to write quorums & updates configurations • Fixed point must be reached -- discovery of new configurations requires new quorums to be reached Read or Write Query Propagate End Query Start Prop. EndProp. Start Query

  24. Reader-Writer: Send/Recv

  25. Reader-Writer: Fixed Points

  26. Read of v1 Read of v0 v0 v0 v0 v1 v0 v0 Write of v1 . . . ( s l o w ) Why Readers Propagate • If the readers do not propagate, atomicity can be easily violated:

  27. RAMBOj Joinerj Reader-Writerj Reconj Joining Protocol RAMBOi join Joineri Reader-Writeri join(J)i ack join Reconi ack gossip join

  28. . . . . . . . . . ± ± ck ck+1  Garbage Collection • When a process has the following configuration map cmap it can garbage-collection configuration cmap(k) = ck • Two-phase protocol using the “gossip” messages • Update own tag & value by obtaining the “best” tag and value from a read- and write-quorum of cmap(k) • Propagate tag & value to a write-quorum of cmap(k+1) • Set cmap(k) to ± • This “bootstraps” configuration k in case it is “too new”

  29. Reconfiguration • Very simple protocol for Reconi • Reconfiguration is free of atomicity concerns • Initiator i (multiple initiators are allows) • Accepts reconfiguration request recon(c,c’)i from environment: reconfigure from c to c’ • If c is the locally-known “latest” configuration k-1, informs member of c of the reconfiguration • Calls Paxos for k to decide on “next” configuration c’ • Informs Reader-Writeri of the new configuration • Participants i • Learn about the initiation of reconfiguration • Participate in Paxos • Inform Reader-Writeri of the new configuration

  30. Latency Analysis • Certain gossip and messages become “important” • Messages to members of “active” configurations when read or write is performed • Messages to configurations k and k+1 when garbage collection is performed • Specific messages when joining and reconfiguring • Responses to such messages • Consider “good” timed executions • Bounded message delay d • 0 local processing time • Environment is well-formed

  31. Additional Assumptions • These are assumptions are used in some results • Configuration-viability for time parameter e • If c becomes “known” as configuration k anywhere • Then either one read- and one write-quorum of c stays alive forever • Or if by time t another configuration is decided upon by non-faulty members of c, then one read- and one write-quorum of c stays alive until t+e • Reconfiguration-spacing for time parameter e • recon(c,*)i occurs at least e time after report(c)i • Join-connectivity for time parameter e • If i and j join by time t then the learn about each other by time t+e

  32. Latency Bounds (selected) • Joining: • 2d, provided “joiner” and “joinee” do not fail • Reconfiguration: • In 0-configuration-viable executions • If recon(c,c’)i action occurs by time t and no members of c fail after t, then recon-acki occurs at t+12d+ • Garbage-collection of ck at non-faulty i : • 4d, if R in read-quorums(ck), W1 in write-quorums(ck), and W2 in write-quorums(ck+1) do not fail • Read and write operations in “stable” systems • If no reconfig-s in progress, then process with “up-to-date” config map completes its operation in 4d • (These do not depend on “gossip”)

  33. More Latency (1) • These bounds depend on periodic gossip • Learning new configurations • If i and j are “old enough” and do not fail, then information from i is conveyed to j within time 2d • Garbage-collection when reconfigurations are 6d-spaced and executions are 6d-configuration-viable • If recon(c,*) occurs before t and c is “known” by t-6d then any non-faulty process that is “old enough” learns about c and garbage-collects any older configuration by time t+6d • All non-faulty “old enough” processes have one or two defined configurations in their configuration maps

  34. More Latency (2) • Read and write operations (with periodic gossip) • Complete in time 8d for non-faulty processes that are “old enough”, provided execution satisfies12.1d-recon-spacing and 6d-configuration-viability • Learning in failure-free executions • Let J be the set of processes that joined by time t1. Then by time t + log|J|, J  worldi for any i in J2. If i in J “knows” a configuration at time t’, then any j in J learns about it by max(t + log|J|, t’) + 2d

  35. Algorithmic Innovations • Dynamic owners of data: • Any and all owners may request reconfiguration • the set of owners can be changed dynamically • Dynamic configurations: • Arbitrary configurations can be installed • no constraints on intersection of quorum sets or member sets in distinct configurations. • Loosely-coupled reconfiguration: • Concurrent reads, writes and reconfiguration • If finite reconfigurations occur during a read or write operation, then its completion does not depend on whether any reconfigurations complete

  36. Algorithmics (cont’d) • Efficient “steady-state”: • Assuming bounded delays, infrequent reconfig-s, and periodic gossip, reads and writes complete in time constant times the message delay • Assuming periodic garbage collection, readers/writers only deal with 1 or 2 configurations • Fast “catch-up”: • New “joiners” with out-of-date configurations can catch up after a logarithmic number of message exchanges provided the “joiners graph” is connected

  37. Comparison with Other Approaches • Paxos or a similar consensus service can be used to agree on global order of operations • We only agree on sequence configurations • Consensus termination impacts only Recon • Reads/writes are not affected by consensus • Group communication systems can also be used • Our algorithm is “from scratch”: low-level send-receive, no hidden/relative costs • Reads/writes work during “new view” establishment • Dynamic quorums / dynamic configurations work • We allow arbitrary new configurations - no static • Our earlier work also solves this problem • New work: concurrent recon-s and garbage-collect

  38. Work in Progress and Futures • Full-fledged implementation is under development • Additional analysis in progress • “Normal timing” starts at some point • Trade-off between configuration-viability and garbage collection • Analysis of “join-connectivity” graphs • Algorithmic refinements • Elimination of unnecessary communication • Explicit “leave” protocol • Gossip: “owners” vs. “users” of objects

More Related