CSIS 7102 Spring 2004 Lecture 6: Distributed databases

CSIS 7102 Spring 2004Lecture 6: Distributed databases Dr. King-Ip Lin

Table of contents • Limitation of locking techniques • Timestamp ordering • View serializability • Optimistic concurrency control • Graph-based locking • Multi-version schemes

Distributed databases • So far, we assume a centralized database • Data are stored in one location (e.g. a single hard disk) • A centralized database management system to handle transaction • To handle multiple requests, a client-server system is used • Client send requests for data to server • Server handle query, transaction management etc.

Distributed databases • This is not the only possibility • In many cases, it may be advantageous for data to be distributed • Branches of a bank • Different part of the government storing different kind of data about a person • Different organizations sharing part of their data • Thus, distributed databases

Distributed databases • Data spread over multiple machines (also referred to as sites or nodes. • Network interconnects the machines • Data shared by users on multiple machines

Distributed databases • Homogeneous distributed databases • Same software/schema on all sites, data may be partitioned among sites • Goal: provide a view of a single database, hiding details of distribution • Heterogeneous distributed databases • Different software/schema on different sites • Goal: integrate existing databases to provide useful functionality

Distributed databases • Advantages of distributed databases • Sharing data – users at one site able to access the data residing at some other sites. • Autonomy – each site is able to retain a degree of control over data stored locally. • Higher system availability through redundancy — data can be replicated at remote sites, and system can function even if a site fails.

Distributed databases • Key features of distributed databases • Typically geographically distributed, with (relatively) slow connections • Typically autonomous, in terms of both administration and execution • However, many cases allows for a coordinator site for each transaction (different coordinator for different transaction) • Local vs. global transactions • A local transaction accesses data in the single site at which the transaction was initiated. • A global transaction either accesses data in a site different from the one at which the transaction was initiated or accesses data in several different sites.

Distributed databases • Global transactions  new issues in transaction processing • Commit coordination: each node cannot unilaterally decide to commit • Transaction cannot be committed at one site and aborted at another • Data replication: The same data may reside in different sites • Possibility for reading different copies  locking have to be careful • Ensuring correctness  updates have to be careful

Distributed databases – rules of the game • Transaction may access data at several sites. • Each site has a local transaction manager responsible for: • Maintaining a log for recovery purposes • Participating in coordinating the concurrent execution of the transactions executing at that site. • Each site has a transaction coordinator, which is responsible for: • Starting the execution of transactions that originate at the site. • Distributing subtransactions at appropriate sites for execution. • Coordinating the termination of each transaction that originates at the site, which may result in the transaction being committed at all sites or aborted at all sites.

Atomicity in distributed databases • Ensuring atomicity means guarding against failures. • Many more kinds of failures in distributed databases • Failure of a site. • Loss of massages • Handled by network transmission control protocols such as TCP-IP • Failure of a communication link • Handled by network protocols, by routing messages via alternative links • Network partition • A network is said to be partitionedwhen it has been split into two or more subsystems that lack any connection between them • Note: a subsystem may consist of a single node • Hard to distinguish between failure and partition

Atomicity in distributed databases • Challenge with respect to atomicity • Consistency over multiple sites • Cannot allow one site to commit and the other site to abort • Two basic protocols • 2-phase commit (most common) • 3-phase commit

Two-phase commit • Goals • Given a transaction that is running on multiple sites, ensure either all the sites commit together or abort together. • Assume that when a site fail, it does not send wrong message to confuse anyone, it just stop working • Need to handle the case that some sites fail during the 2-phase commit process

Two-phase commit • Simple idea • Select one site as the coordinator (the other sites are called participants) • Go ask all the sites whether each of them want to abort (phase 1) • Wait to collect all the answers and make final decision; broadcast the decision to all the sites; sites act accordingly (phase 2) • Issues • If a site failed and then quickly recovered, how do I know what I have done? • What if a site failed in the middle, does everybody have to wait for him? • What if the coordinator fails?

Two-phase commit • If a site failed and then quickly recovered, how do I know what I have done? • Need to have a log to record what has been done • Log in “stable storage” • Should I log before I act? • Write-ahead log

Two-phase commit • What if a participant site failed? • By our assumption, it will not respond • The coordinator will wait for a time, and then decide that one site failed • The decision should be: abort • What if the coordinator failed? • Trickier, will deal with it later

Two-phase commit: phase 1 • Phase 1: coordinator ask for decision • Coordinator (Ci) asks all participants to prepareto commit transaction T. • Ci adds the records <prepare T> to the log and forces log to stable storage • sends prepare T messages to all sites at which T executed • Why should coordinator write the record before sending messages?

Two-phase commit: phase 1 • Upon receiving message, transaction manager at site determines if it can commit the transaction • if not, • add a record <no T> to the log • send abort T message to Ci • if the transaction can be committed, then: • add the record <ready T> to the log • force all log records for T to stable storage • send readyT message to Ci • Why can’t the site commit right away?

Two-phase commit: phase 2 • Phase 2: coordinator make the decision and broadcast the result • T can be committed of Cireceived a ready T message from all the participating sites: otherwise T must be aborted. • Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces record onto stable storage. Once the record stable storage it is irrevocable (even if failures occur) • Notice that a transaction is deemed commited/aborted at this point of time

Two-phase commit: phase 2 • Coordinator sends a message to each participant informing it of the decision (commit or abort) • Participants take appropriate action locally. • It also record on the log whether it commit <commit T> or abort <abort T>

Two phase commit : participant failures • Suppose a participating site S fails. What must it do when it come back up? • First, check what is in the log • Case 1: S sees <commit T> • Meaning: Coordinator has decided to commit T and the decision is final • Thus: S should make sure the transaction commits at that site (redo T)

Two phase commit : participant failures • Case 2: S sees <abort T> • Meaning: Coordinator has decided to abort T and the decision is final • Thus: S should make sure the transaction aborts at that site (undo T)

Two phase commit : participant failures • Case 3: S sees <ready T> • Meaning: T can be committed from the point of view of S only • Does S know the final decision yet? • Thus: S must query the coordinator about the final decision, and act accordingly

Two phase commit : participant failures • Case 4: S sees nothing • Meaning: S has not even respond to the initial query from the coordinator • Thus: S must send its decision to the coordinator • But is it really necessary? • If the coordinator does not hear from S for a long time, it will assume S has failed, thus aborting the transaction • Thus S can safely decide to abort without any problem and without sending its decision to the coordinator (Why?)

Two-phase commit: coordinator failure • Suppose the coordinator fails • Then participants must make a decision • Case 1 : a site sees <commit T> • Meaning: T has commited • Thus: broadcast the result and ensure everyone commited • Case 2 : a site see <abort T> • Meaning: T has aborted • Thus: broadcast the result and ensure everyone aborted

Two-phase commit: coordinator failure • Case 3 : a site sees nothing • Meaning: No decision has been made (or a decision has been made to abort) • Thus: it is safe to abort (instead of waiting for the coordinator)

Two-phase commit: coordinator failure • Case 4 : none of the above • Meaning: every participant that is alive has told the coordinator that it can commit • Thus, it is possible that the coordinator have made a decision but have yet to send it out • Note that the result may still be T to be aborted • All participant must wait for the coordinator to return for its decision • Thus two-phase commit is blocking in this case

Two-phase commit: network partition • If the coordinator and all its participants remain in one partition, the failure has no effect on the commit protocol. • If the coordinator and its participants belong to several partitions: • Sites that are not in the partition containing the coordinator think the coordinator has failed, and execute the protocol to deal with failure of the coordinator. • No harm results, but sites may still have to wait for decision from coordinator. • The coordinator and the sites are in the same partition as the coordinator think that the sites in the other partition have failed, and follow the usual commit protocol. • Again, no harm results

Three-phase commit • Limitation of two-phase commit • Blocking when coordinator dies • To overcome it, create a new phase called pre-commit • Coordinator tells at least k sites that it wants to commit • Thus now, 3-phases • Phase 1 : Coordinator check if T can commit, participant send their choice to coordinator • Phase 2 : Coordinator makes decision • If commit, send pre-commit message to k sites • If abort, send message to everyone to abort • Phase 3 : • If commit, final commit decision is broadcast and everyone commits

Three-phase commit • What does 3-phase buys: • If coordinator aborts, then participants can figure out commit decision by pre-commit and then go on commit • If no pre-commit message is find, one can safely abort • No blocking • Limitations: • No more than k sites can fail • Otherwise, pre-commit message may be lost • Network partition can cause problem • Maybe pre-commit all resides in one section • Thus, not widely used

Concurrency control in distributed databases • Modify concurrency control schemes for use in distributed environment. • Assumptions: • Each site participates in the execution of a commit protocol to ensure global transaction atomicity. • Data item may be replicated at multiple sites • However, updates (writes) have to be done on ALL the copies of an item

Locking protocols in distributed databases • Two-phase locking based protocols • Key question: • Who to manage the locks? • Centralized vs. Distributed • How many item to locks? • In case when data have copies of multiple sites • Tradeoff between efficiency and concurrency • Efficiency includes message send between sites

Locking protocols in distributed databases – centralized vs. distributed • Centralized lock manager • All lock requests for all items go to one site • Even if the item does not reside in that site • When a transaction needs to lock a data item, it sends a lock request to Si and lock manager determines whether the lock can be granted immediately • If yes, lock manager sends a message to the site which initiated the request • If no, request is delayed until it can be granted, at which time a message is sent to the initiating site

Locking protocols in distributed databases – centralized vs. distributed • Centralized lock manager • After obtaining the lock • A transaction can read from any one site that contain the item • A transaction must write to ALL sites that contain the item • Advantages • Simple to implement • Simple deadlock handling • Disadvantage • Bottleneck for lock manager • Vulnerability – site when down, everything is blocked

Locking protocols in distributed databases – centralized vs. distributed • Distributed lock manager • Each site has its own lock manager to handle request for items • Need special protocol to access data • Advantages • Distributed workload • Fault-tolerant • Disadvantages • Deadlock handling complicated • Potentially more messages.

Locking protocols in distributed databases – Distributed protocols • Primary copy • Choose one replica of data item to be the primary copy. • Site containing the replica is called the primary site for that data item • Different data items can have different primary sites • When a transaction needs to lock a data item Q, it requests a lock at the primary site of Q. • Implicitly gets lock on all replicas of the data item

Locking protocols in distributed databases – Distributed protocols • Primary copy • Benefit • Concurrency control for replicated data handled similarly to unreplicated data - simple implementation. • Drawback • If the primary site of Q fails, Q is inaccessible even though other sites containing a replica may be accessible

Locking protocols in distributed databases – Distributed protocols • Majority protocol • Local lock manager at each site administers lock and unlock requests for data items stored at that site. • When a transaction wishes to lock an unreplicated data item Q residing at site Si, a message is sent to Si ‘s lock manager. • If Q is locked in an incompatible mode, then the request is delayed until it can be granted. • When the lock request can be granted, the lock manager sends a message back to the initiator indicating that the lock request has been granted.

Locking protocols in distributed databases – Distributed protocols • Majority protocol • In case of replicated data • If Q is replicated at n sites, then a lock request message must be sent to more than half of the n sites in which Q is stored. • The transaction does not operate on Q until it has obtained a lock on a majority of the replicas of Q. • When writing the data item, transaction performs writes on all replicas.

Locking protocols in distributed databases – Distributed protocols • Majority protocol • Benefit • Can be used even when some sites are unavailable • details on how handle writes in the presence of site failure later • Drawback • Requires 2(n/2 + 1) messages for handling lock requests, and (n/2 + 1) messages for handling unlock requests. • Potential for deadlock even with single item - e.g., each of 3 transactions may have locks on 1/3rd of the replicas of a data. • Can be overcome by predetermine order of sites being locked

Locking protocols in distributed databases – Distributed protocols • Biased protocol (read-once, write-all) • Local lock manager at each site as in majority protocol, however, requests for shared locks are handled differently than requests for exclusive locks. • Shared locks. When a transaction needs to lock data item Q, it simply requests a lock on Q from the lock manager at one site containing a replica of Q. • Exclusive locks. When transaction needs to lock data item Q, it requests a lock on Q from the lock manager at all sites containing a replica of Q. • Advantage - imposes less overhead on read operations. • Disadvantage - additional overhead on writes

Locking protocols in distributed databases – Distributed protocols • Quorum Consensus Protocol • A generalization of both majority and biased protocols • Each site is assigned a weight. • Let S be the total of all site weights • Choose two values read quorum Qr and write quorum Qw • Such that Qr +Qw > S and 2 *Qw > S • Quorums can be chosen (and S computed) separately for each item • Each read must lock enough replicas that the sum of the site weights is >= Qr • Each write must lock enough replicas that the sum of the site weights is >= Qw • For now we assume all replicas are written • Extensions to allow some sites to be unavailable described later

Deadlocks in distributed databases • Deadlock can occur in distributed databases • Even worse, deadlocks can be distributed Consider the following two transactions and history, with item X and transaction T1 at site 1, and item Y and transaction T2 at site 2: T1: write (X) write (Y) T2: write (Y) write (X)

Deadlocks in distributed databases • However, the following schedule can occur • Now there is a deadlock between T1 and T2 • However, at site 1, the only thing happening is T1 waiting for T2 • At site 2, the only thing happening is T2 waiting for T1 • So no deadlock is detected at individual sites X-lock(X) Write(X) X-lock(Y) -- wait X-lock(Y) Write(Y) X-lock(X) -- wait T1 T2

Deadlocks in distributed databases • Deadlock detection need to be more careful • Local wait-for graph constructed on each site • Global wait-for graph combining information from each site • Deadlock is detected from global wait-for graph • Notice that no cycle for local wait-for graph  no cycle for global wait-for graph

Deadlocks in distributed databases Local Global

Deadlocks in distributed databases • A global wait-for graph is constructed and maintained in a single site; the deadlock-detection coordinator • Real graph: Real, but unknown, state of the system. • Constructed graph: Approximation generated by the controller during the execution of its algorithm. • The real graph can be unknown due to • Network delays (changes are not propagated) • Network partition

Deadlocks in distributed databases • the global wait-for graph can be constructed when: • a new edge is inserted in or removed from one of the local wait-for graphs. • a number of changes have occurred in a local wait-for graph. • the coordinator needs to invoke cycle-detection. • If the coordinator finds a cycle, it selects a victim and notifies all sites. The sites roll back the victim transaction.

Deadlocks in distributed databases • Limitations: false cycles • Suppose the local wait for graph is as the r.h.s: • Now suppose • T2 release the resources on S1 • Edge from T1 to T2 should be deleted • Then, T2 request resources held by T3 on S2 • Edge from T2 to T3 should be added at site 2 • If the second message arrive before the first, then a deadlock is detected while in fact it isn’t • This can be avoided if (global) 2-phase locking is maintained

Timestamp ordering in distributed databases • Timestamp based techniques can be used in distributed databases • Main issues: how to generate unique timestamps for transactions across multiple sites • Solution: • Each site generates a unique local timestamp using either a logical counter or the local clock. • Global unique timestamp is obtained by concatenating the unique local timestamp with the unique identifier.

CSIS 7102 Spring 2004 Lecture 6: Distributed databases

CSIS 7102 Spring 2004 Lecture 6: Distributed databases

Presentation Transcript

Distributed Operating Systems Spring 2004

Spatial Databases: Lecture 6

Distributed Databases

Distributed Databases

Distributed Operating Systems Spring 2004

Distributed Databases

Distributed databases

Distributed databases

CSIS 7102 Spring 2004 Lecture 2 : Serializability

CSIS 7102 Spring 2004 Lecture 3 : Two-phase locking

CSIS 7102 Spring 2004 Lecture 10: ARIES

Distributed Databases

CSIS 7102 Spring 2004 Lecture 4 : Issues in locking techniques

CSIS 123A Lecture 6

CSIS 7102 Spring 2004 Lecture 9: Recovery (approaches)

Distributed Databases

CSIS 7102 Spring 2004 Lecture 10: ARIES

CSIS 7102 Spring 2004 Lecture 1 : Overview

CSIS 7102 Spring 2004 Lecture 3 : Two-phase locking

Distributed Databases