1 / 14

CS 603 Data Replication

CS 603 Data Replication. February 25, 2002. Data Replication: Why?. Fault Tolerance Hot backup Catastrophic failure Performance Parallelism Decreased reliance on network This is a two-edged sword. Data Replication: What?. Correctness criterion: Replication invisible

darius
Download Presentation

CS 603 Data Replication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 603Data Replication February 25, 2002

  2. Data Replication: Why? • Fault Tolerance • Hot backup • Catastrophic failure • Performance • Parallelism • Decreased reliance on network This is a two-edged sword

  3. Data Replication: What? • Correctness criterion: Replication invisible • Results indistinguishable from one-copy database • One-copy serializability (1SR) • Alternatives • Bounded inconsistency • User selection of real/copy More discussion Friday

  4. Data Replication: How? • Goal: Ensure one-copy serializability • Write-all solution: All copies identical • Write goes to every site • Read from any site • Standard single-copy concurrency control • Guarantees 1SR • Single-copy concurrency control gives serializable execution • Equivalent to serial execution where all writes happen in one transaction

  5. Write All Approach Writer Reader 5 read 5 5 5 read 3 3 3 3 5 5 5

  6. Problem: Site Failure • Failure causes write to block • Must maintain locks • Clogs up entire system Is this fault tolerance? • What about “write all available”? • T0: w0[xA] w0[xB] w0[yC] c0 • B-fails • T1: r1[yC] w1[xA] c1 • B-recovers • T2: r2[xB] w2[yC] c2 • What is the serial equivalent order?

  7. Model for Replicated Data • Data and Transaction Managers at each site • Data Manager: local concurrency control to guarantee local serializability • Transaction manager: Distributed actions • Turns reads/writes into multi-site reads/writes • Runs commit protocol • Directory to get sites of each copy

  8. Failure Assumptions • Communications failure: Site A does not receive reads/writes on xA issued by B • Site failure: Site A is unable to process reads/writes on xA issued by B • Communications failure: Site A processes but does not acknowledge reads/writes on xA issued by B • Fail-stop model, detectable by timeout

  9. Types of Write • Write(x): All copies of x will eventually be written • Immediate write • Send write to all sites on request • Quick detection of conflict • Delayed write • Delays non-local writes until commit • Minimizes message traffic • Abort is cheap • Primary copy write • Quick detection of conflict • Lower message traffic than immediate write

  10. Distributed Serializability • A complete replicated data (RD) history H over T = {T0, …, Tn} is a partial order with ordering relation < where • H = h(ni=0Ti) for some translation function h • for each Ti and all operations pi, qi in Ti, if pi <iqi, then every operation in h(pi) is related by < to every operation in h(qi) • for every rj[xA], there is at least one wi[xA] < rj[xA] • if wi[x] H and rj[x] H, then wi[x] < rj[x] or rj[x] < wi[x] • if wi[x] <iri[x] and h(ri[x]) = ri[xA] then wi[xA] h(wi[x]) • Theorem: If reads-from relationships same as serial history, RD history is 1-copy serializable

  11. Write All Available FailsEven if no recovery!

  12. Solutions • Validate availability on commit • Check if any failed writes now available • Check that all sites read or written still available • Enforces serializability for site failures Doesn’t work with communication failures!

  13. Communication Failures • Available copies fails on network partition • Each side succeeds in validation • Write all blocks • Write n-k, read k+1 • Generalization of the “write all” approach • Handles up to min(n-k, k+1) failures • Tradeoff read vs. write performance • Partition effect based on size of partition: • <k+1: small partition acts as if all sites failed, large continues • Otherwise entire system becomes read-only

  14. Other approaches:Don’t enforce Serializability! • Master copy • Writes must update master copy • Reads can be consistent or inconsistent • Bounded inconsistency • Time bound on update of copies • Value bound: write all if difference too great • Dumps consistency on the application • Added complexity • Better performance

More Related