1 / 27

Replication

Replication. CSE 593 - Transaction Processing Philip A. Bernstein. Outline. 1. Introduction 2. Singe-Master Primary-Copy Replication 3. Multi-Master Replication 4. Other Approaches. 1. Introduction.

bgustafson
Download Presentation

Replication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Replication CSE 593 - Transaction Processing Philip A. Bernstein

  2. Outline 1. Introduction 2. Singe-Master Primary-Copy Replication 3. Multi-Master Replication 4. Other Approaches

  3. 1. Introduction • Replication - using multiple copies of a server (called replicas) for better availability and performance. • If you’re not careful, replication can lead to • worse performance - updates must be applied to all replicas, synchronized • worse availability - some algorithms require multiple replicas to be operational for any of them to be used

  4. Replicated Servers • Can replicate servers on a common resource • e.g., data sharing where DB system is on clients communicating with shared disk • Helps availability in primary-backup scenario • Helps performance only if • light update traffic or • little conflict between transactions at different servers or • loose coherence guarantees (e.g. read committed) Client Server Replica 1 Server Replica 2 Resource

  5. Resource replica Resource replica Replicated Resource • To get more improvement in availability, replicate the resources (too). • Also increases potential throughput • This is what’s usually meant by replication and it’s the scenario we’ll focus on Client Client Server Replica 1 Server Replica 2

  6. Synchronous Replication • Replicas should behave functionally like non-replicated servers • Synchronous replication - each transaction updates all replicas of every item it updates • The algorithms are complex and expensive, due to failure and recovery handling • Too expensive for most applications due to heavy distributed transaction load • No general-purpose database system products support it Start Write(x1) Write(x2) Write(x3) Commit x1 x2 x3

  7. Asynchronous Replication • Asynchronous replication - each transaction updates one replica and the update is propagated later to other replicas • Apply deferred updates to all replicas in the same order • If new updates stopped, replicas would converge to the same state. • Minimize synchronization expense of applying updates in the same order everywhere Later: Write(x2) Write(x3) x1 Start … Write(x1) ... Commit x2 x3

  8. Communications Failures • Algorithm must continue operating during failures and recoveries • Communication failures are a big problem, because two independent disconnected nodes could process conflicting updates • Solutions • Only allow one set of connected nodes to run updates • Allow independent disconnected nodes to update and reconcile conflicting updates when they’re reunited

  9. 2. Single-Master Primary-Copy Replication • Designate one replica as the primary copy • Transactions may update only the primary copy • Updates to the primary are sent later to the secondary replicas in the same order they were applied to the primary T1: Start … Write(x1) ... Commit x2 ... x1 T2 ... xm Primary Copy Tn Secondaries

  10. Synchronous Update Propagation • Can apply updates to secondaries within the original transaction, but • it’s expensive (requires 2-phase commit) • can’t control when updates are applied to replica • requires heavy-duty synchronization of failures R1[xA] yD fails W1[yC] R2[yD] xA fails W2[xB] • Not equivalent to a one-copy execution, even if xA and yD never recover!

  11. Asynchronous Update Propagation • Most DB systems collect updates at primary in a log and propagate them asynchronously to replicas • Post-process the log to minimize its size • Include only committed updates • Include fine-grained updates • Alternative - capture primary copy updates in a database table, via SQL triggers • SQL trigger fires on every update to primary • Most DB systems support this today. • Originally introduced by Tandem Non-Stop SQL and VAX Data Distributor.

  12. Replicating Requests • Replicate the request to run the transaction rather than the updates produced by the transaction • To ensure requests run in the same order at primary and replica, either • run requests serially at replica in the order they ran at the primary, or • run request synchronously at all replicas, but commit even if one replica fails. (Need a recovery procedure for failed replicas.) Supported by Digital’s RTR.

  13. Failure Handling • [back to replicating updates] • Secondary failure - nothing to do till it recovers • At recovery, apply the updates it missed while down • Needs to determine which updates it missed, but this is no different than log-based recovery • If down for too long, may be faster to get a whole copy • Primary failure - elect a new primary • A secondary that detects primary’s failure announces a new election by broadcasting its unique replica identifier • Other secondaries reply with their replica identifier • The largest replica identifier wins

  14. Failure Handling (cont’d) • Primary failure (cont’d) • All replicas must now check that they have the same updates from the failed primary • During the election, each replica reports the id of the last log record it received from the primary • Could still lose an update that committed at the primary and wasn’t forwarded before the primary failed … but solving it requires synchronous replication (2-phase commit to propagate updates to replicas)

  15. Communications Failures • Secondaries can’t distinguish a primary failure from a communication failure that partitions the network. • If the secondaries elect a new primary and the old primary is still running, there will be a reconciliation problem when they’re reunited. • To avoid this, one partition must know it’s the only one that can operate, and can’t communicate with other partitions to figure this out. • Could make a static decision. The partition that has the primary wins. • Dynamic solutions are based on Majority Consensus

  16. Majority Consensus • Whenever a set of communicating replicas detects a replica failure or recovery, they test if they have a majority (more than half) of the replicas. • If so, they can elect a primary • Only one set of replicas can have a majority. • Doesn’t work well with even number of copies. • Useless with 2 copies • Quorum consensus • Give a weight to each replica • The replica set that has a majority of the weight wins • E.g. 2 replicas, one has weight 1, the other weight 2

  17. 3. Multi-Master Replication • Some systems must operate when partitioned • Classic example - salesperson’s disconnected laptop • Customer table (rarely updated) • Orders table (insert mostly) • Customer log table (append only) • So conflicting updates from different salespeople are rare • Use primary-copy algorithm, with multiple masters • Each master exchanges updates with other replicas when it reconnects to the network • What about conflicting updates? • Implemented in Lotus Notes, Msft Access,Oracle, ...

  18. Example of Conflicting UpdatesA Classic Race Condition Replica 1 Primary Replica 2 Initially x=0 Initially x=0 Initially x=0 T1: X=1 T2: X=2 Send (X=1) Send (X=2) X=1 Send (X=1) X=2 Send (X=2) X=1 X=2 • Replicas end up in different states

  19. Thomas’ Write Rule • To ensure replicas end up in the same state • Tag each data item with a timestamp • A transaction updates the value and timestamp of data items (timestamps monotonically increase) • An update to a replica is applied only if the update’s timestamp is greater than the data item’s timestamp • You only need to keep timestamps of data items that were recently updated (where an older update could still be floating around the system)

  20. Wingman Algorithm • Multi-master replication without a primary • In Microsoft Access 7.0 and Visual Basic 4.0 • Each row of a table has 4 additional columns • globally unique id (GUID) • generation number, to determine which updates from other replicas have been applied • version number = a count of the number of updates to this row • array of [replica, version number] pairs, identifying the largest version number it received for this row from every other replica.

  21. Wingman Generation Numbers • Each replica has a current generation number • A replica updates a row’s generation number whenever it updates the row • A replica remembers the generation number it had when it last exchanged updates with R´, for every replica R´. • A replica increments its generation number every time it exchanges updates with another replica. • So, when exchanging updates with R, it should send all rows with a generation number larger than what it had when last exchanging updates with R.

  22. Exchanging Updates • Use Thomas’ Write Rule to process an update from another replica • Compare the update’s and row’s version numbers • The one with larger version number wins (use replica id to break ties) • Yields the same result at both replicas, but maybe not serializable

  23. Non-Serializable Execution • Suppose two replicas perform updates to x • Replica A does 2 updates, incrementing version number from 1 to 3 • Replica B does 1 update, incrementing version number from 1 to 2 • When they exchange updates, replica A has higher version number and wins, causing replica B’s update to be lost • For this reason, rejected updates are retained in a conflict table for later analysis

  24. Rejecting Duplicate Updates • Some rejected updates are surely duplicates • To identify them - • When applying an update to x, replace x’s array of [replica, version#] pairs by the update’s array. • To avoid processing the same update via many paths, check version number of arriving update against the array • Consider a rejected update to x at R from R´, where • [R´, V] describes R´ in x’s array, and • V´ is the version number sent by R´. • If V  V´, then R saw R´’s updates • If V < V´, then R didn’t see R´’s updates, so store it in the conflict table for later reconciliation

  25. Multi-Master Performance • The longer a replica is disconnected and performing updates, the more likely it will need reconciliation • The amount of propagation activity increases with more replicas • If each replica is performing updates, the effect is quadratic

  26. 4. Other Approaches • Non-transactional replication using timestamped updates and variations of Thomas’ write rule • directory services are managed this way • Quorum consensus per-transaction • Read and write a quorum of copies • Each data item has a version number and timestamp • Read operation chooses a replica with largest version number • Write operation increments version number one greater than any one it has seen • No special work needed during a failure or recovery

  27. Other Approaches (cont’d) • Read-one replica, write-all-available replicas • Requires careful management of failures and recoveries • E.g., Virtual partition algorithm • Each node knows the nodes it can communicate with, called its view • Transaction T can execute if its home node has a view including a quorum of T’s readset and writeset (i.e. the data it can read or write) • If a node fails or recovers, run a view formation protocol (much like an election protocol) • For each data item with a read quorum, read the latest version and update the others with smaller version #.

More Related