1 / 17

Flexible Update Propagation for Weakly Consistent Replication

Flexible Update Propagation for Weakly Consistent Replication. Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers Presented by: Ryan Huebsch CS294-4 P2P Systems – 10/13/03. Outline. Anti-Entropy Goals Data Structures Ordering The Algorithm

vance-pace
Download Presentation

Flexible Update Propagation for Weakly Consistent Replication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Flexible Update Propagation for Weakly Consistent Replication Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers Presented by: Ryan Huebsch CS294-4 P2P Systems – 10/13/03

  2. Outline • Anti-Entropy • Goals • Data Structures • Ordering • The Algorithm • Creation and Retirement • Discussion • Performance • P2P discussion/questions

  3. Anti-Entropy • Entropy - a process of degradation or running down or a trend to disorder. • Bring 2 replicas up-to-date • Three Major Design Decisions • Pairwise communication between replicas • Exchange of update operations • Ordered propagation of operations

  4. Goals • Support for arbitrary communication topologies • Operation over low-bandwidth networks • Incremental progress • Eventual consistency • Efficient storage management • Light-weight management of dynamic replica sets • Arbitrary policy choices

  5. Data Structures • Replica: • Database • Write Log • Server: • Clock • V, O • CSN, OSN Database Committed (< CSN) Truncated (< OSN) Log A B C A B C V O Truncated Log Highest A.Clockfor server Athat is in log Highest A.Clock for server A that has been truncated …

  6. Orderings • Prefix Property • If R has write Wi that was accepted by server X, it has all writes X accepted before Wi • Stable (Committed Order) • Decided by primary replica • Assigns the final CSN, which is < infinity • New CSN is propagated to nodes • Accept Order • Partial order of all writes accepted by a particular server • Accept stamp – logical or real-time clock

  7. Orderings, continued • Causal-Accept Order • Accept-stamp is a logical clock • Clock is advanced when a write is received (through anti-entropy) that has a higher accept-stamp. • Provides better chances of a node seeing the same database from different servers • If they have the same writes, even if uncommitted, will be same order

  8. The Algorithm (Quick Version) • R is being updated by S • S retrieves R.V and R.CSN • STEP 1: Decide if a full transfer is needed • IF (S.OSN > R.CSN) THEN [If S does have enough log] Rollback S’s database to the state corresponding to S.O [Remove all writes that S has a log for] OutputDatabase(S.DB) OutputVector(S.O) OutputOSN(S.OSN)[R now has the same database and truncated the write log to the same point as S]END

  9. The Algorithm, continued • Step 2: Bring R up-to-date with remaining committed writes • IF R.CSN < S.CSN THEN[If R is missing committed writes] w = first write after CSNWHILE (w) DO IF w.accept-stamp <= R.V(w.server-id) THEN [Check R’s vector to see if it has the write]OutputCommitNotification(w) ELSE OutputWrite(w) END w = next commited write in S.log ENDEND

  10. The Algorithm, continued • Step 3: Bring R up-to-date with remaining uncommitted writes • w = first tentative write in S.logWHILE (w) DO IF R.V(w.server-id) < w.accept-stamp THEN[Check R’s vector to see if has the write] OutputWrite(w) END w = next write in S.logEND • Step 4: Finish Up • OutputCSN(S.CSN)OutputVector(S.V)

  11. Creation and Retirement • Treated just like a write (elegant) • Si is trying to join via server Sx • Sx creates a new write • <infinity, Tk,i, Sk> • Si is server id, <Tk,i, Sk> • Si sets clock to Tk,i + 1 • Notice the new server id is globally unique, recursive, and could be long • The write is propagated to other nodes through anti-entropy

  12. Creation and Retirement, continued • Server S is updating server R • Server S.V has an entry for server Si (<Tk,i, Sk>), while R does not. • 2 Cases: • R has not seen the creation of Si • Then R.V(Sk) < Tk,i • S has not seen the retirement of Si • Then R.V(Sk) >= Tk,i • Why? Creation/Deletion is recorded as a normal write, thus the prefix property will hold. • Recursive naming helps too, if Sk retired, can still trace back and decide the proper state. This is explained as the virtual CompleteV in the paper.

  13. Discussion

  14. Discussion, continued • Most properties are not special in themselves, the combination is novel • Different decisions are mostly independent • Ideas can be applied to other systems (other than Bayou) • Security • Use certificates to insure user can make update • Not much detail given • Used later on as an excuse for high overheads • Lots of policy decisions to be made • When to reconcile, with whom, when to truncate log

  15. Performance • 1316 bytes of update overhead • 520 bytes for certificate • Network transfer most significant cost

  16. Performance, continued • Hard to know if the numbers are good, nothing to compare them to • Would have been nice to see a larger deployment and measure propagation delay, consistency, etc.

  17. P2P? • Is Anti-Entropy applicable to P2P systems? • Review the goals… arbitrary topology, low b/w, aggressive storage management… • There is a centralized component (the serializer)… is this okay? • Can it handle failures/churn? • Security, what happens if there is a faulty node?

More Related