CSS434 Distributed Transactions and Replication
Download
1 / 25

- PowerPoint PPT Presentation


  • 177 Views
  • Uploaded on

CSS434 Distributed Transactions and Replication Textbook Ch 14 - 15. Professor: Munehiro Fukuda. Outline. Distributed transaction Two-phase commitment protocol File replication Group communication revisited Primary copy replication Active replication Read-any-write-all protocol

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - bronwen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

CSS434 Distributed Transactions and Replication

Textbook Ch 14 - 15

Professor: Munehiro Fukuda

CSS434 Replication


Outline l.jpg
Outline

  • Distributed transaction

    • Two-phase commitment protocol

  • File replication

    • Group communication revisited

    • Primary copy replication

    • Active replication

      • Read-any-write-all protocol

      • Available copy protocol

      • Quorum-based protocol

CSS434 Replication


Distributed transaction example banking transaction l.jpg

join

openTransaction

participant

closeTransaction

A

a.withdraw(4);

.

.

join

BranchX

T

participant

b.withdraw(T, 3);

Client

B

b.withdraw(3);

T =

openTransaction

BranchY

join

a.withdraw(4);

c.deposit(4);

participant

b.withdraw(3);

c.deposit(4);

d.deposit(3);

C

closeTransaction

D

d.deposit(3);

Note: the coordinator is in one of the servers, e.g. BranchX

BranchZ

Distributed TransactionExample: Banking Transaction

CSS434 Replication


Transaction commitment l.jpg
Transaction Commitment

  • How can all participant servers either commit a transaction or abort it?

    • One-phase atomic commit protocol

      • The coordinator keep requesting all participants to commit until they return an acknowledgment.

      • No chance of a participant to initiate an abort.

    • Two-phase commit protocol

      • Phase 1: calls for participants’ vote.

      • Phase 2: Complete a commit or an abort according to output of vet.

CSS434 Replication


Two phase c ommit p rotocol operations l.jpg
Two-PhaseCommit Protocoloperations

  • canCommit?(trans)-> Yes / No

    • Call from coordinator to participant to ask whether it can commit a transaction. Participant replies with its vote.

  • doCommit(trans)

    • Call from coordinator to participant to tell participant to commit its part of a transaction.

  • doAbort(trans)

    • Call from coordinator to participant to tell participant to abort its part of a transaction.

  • haveCommitted(trans, participant)

    • Call from participant to coordinator to confirm that it has committed the transaction.

  • getDecision(trans) -> Yes / No

    • Call from participant to coordinator to ask for the decision on a transaction after it has voted Yes but has still had no reply after some delay. Used to recover from server crash or delayed messages.

CSS434 Replication


Two phase c ommit p rotocol communication l.jpg

Coordinator

Participant

step

status

step

status

canCommit?

prepared to commit

1

Yes

(waiting for votes)

2

prepared to commit

(uncertain)

doCommit

3

committed

haveCommitted

4

committed

done

Two-PhaseCommit ProtocolCommunication

CSS434 Replication


Two phase commit protocol state transition l.jpg

Coordinator

Worker 1

Worker 2

INIT

INIT

INIT

Client_wants_to_commit

CanCommit?

CanCommit?

Vote-Yes

CanCommit?

Vote-Yes

WAIT

CanCommit?

Vote-No

CanCommit?

Vote-No

READY

READY

Vote-No

doAbort

Vote-Yes

doCommit

doAbort

Ack

doCommit

Ack

doAbort

Ack

doCommit

Ack

COMMIT

ABORT

COMMIT

ABORT

COMMIT

ABORT

Two-Phase Commit ProtocolState Transition

Another possible cases:

The coordinator didn’t receive all vote-Yes. → Time out and send a doAbort.

A worker didn’t receive a CanCommit?. → All workers eventually receive a doAbort.

A worker didn’t receive a doCommit. → Time out and check the other work’s status.

CSS434 Replication


File replication concepts l.jpg
File ReplicationConcepts

  • Difference between replication and caching

    • A replica is associated with a server, whereas a cache with client.

    • A replicate focuses on availability, while a cache on locality

    • A replicate is more persistent than a cache is

    • A cache is contingent upon a replica

  • Advantages

    • Increased availability/reliability

    • Performance enhancement (response time and network traffic)

    • Scalability and autonomous operation

  • Requirements

    • Naming: no need to be aware of multiple replicas.

    • Consistency: data consistency among replicated files.

    • Replication control: explicit v.s. implicit/lazy replication

    • ACID: Atomicity, Consistency, Isolation, and Durability

CSS434 Replication


File replication basic architectural model l.jpg
File ReplicationBasic Architectural Model

  • Request: send a client request to a server.

  • Coordination: deliver the request to each replica manger in some order.

  • Execution: process a client request but not permanently commit it.

  • Agreement: agree if the execution will be committed (ex. Two-phase commit protocol)

  • Response: respond to the front end

Client

Replica

Manger

Front

End

Replica

Manger

Client

Front

End

Replica

Manger

Ex: DNS Web server

CSS434 Replication


Review group communication l.jpg
Review: Group Communication

  • Group membership service

    • Create and destroy a group.

    • Add or withdraw a replica manager to/from a group.

    • Detect a failure.

    • Notify members of group membership changes.

    • Provide clients with a group address.

  • Message delivery

    • Absolute ordering

    • Consistent ordering

Replica

Manger

Replica

Manger

Client

Replica

Manger

Replica

Manger

group

CSS434 Replication


Review group communication example isis l.jpg

Deleted or delivered?

Review: Group CommunicationExample: ISIS

  • Group view

multicast

Joins the group

p1

multicast

p2

multicast

rejoins

crashed

p3

p4

Multicast to

available processes

In ISIS, if P4 receives this partially multicast message at the same time when

it knows p3 has been crashed, it forwards it to all the others and immediately

sends a flush message. In other words, P1, P2, and P4 receive this multicast

message as if P3 was still alive.

CSS434 Replication


Review group communication absolute ordering linearizability l.jpg
Review: Group CommunicationAbsolute Ordering - Linearizability

  • Rule:

    • Mi must be delivered before mj if Ti < Tj

  • Implementation:

    • A clock synchronized among machines

    • A sliding time window used to commit message delivery whose timestamp is in this window.

  • Example:

    • Distributed simulation

  • Drawback

    • Too strict constraint

    • No absolute synchronized clock

    • No guarantee to catch all tardy messages

Ti < Tj

Ti

mi

Tj

mi

mj

mj

CSS434 Replication


Review group communication total ordering sequential consistency l.jpg
Review: Group CommunicationTotal Ordering - Sequential Consistency

  • Rule:

    • Messages received in the same order (regardless of their timestamp).

  • Implementation:

    • A message sent to a sequencer, assigned a sequence number, and finally multicast to receivers

    • A message retrieved in incremental order at a receiver

  • Example:

    • Replicated database update

  • Drawback:

    • A centralized algorithm

Ti < Tj

Ti

Tj

mj

mj

mi

mi

CSS434 Replication


Multi copy update problem l.jpg
Multi-copy Update Problem

  • Keep in mind the basic architecture and group communication models, how can we update multiple copies over replica servers?

    • Read-only replication

      • Allow the replication of only immutable files.

    • Primary backup replication

      • Designate one copy as the primary copy and all the others as secondary copies.

    • Active backup replication

      • Access any or all of replicas

        • Read-any-write-all protocol

        • Available-copies protocol

        • Quorum-based consensus

CSS434 Replication


Primary copy replication l.jpg
Primary-Copy Replication

  • Request: The front end sends a request to the primary replica.

  • Coordination:. The primary takes the request atomically.

  • Execution: The primary executes and stores the results.

  • Agreement: The primary sends the updates to all the backups and receives an ask from them.

  • Response: reply to the front end.

  • Advantage: an easy implementation, linearizable, coping with n-1 crashes.

  • Disadvantage: large overhead especially if the failing primary must be replaced with a backup.

Client

Replica

Manger

Front

End

Primary

Backup

Replica

Manger

Client

Front

End

Replica

Manger

Backup

Ex: Sun NIS (Yellow Page)

CSS434 Replication


Active replication l.jpg
Active Replication

  • Request: The front end multicasts to all replicas.

  • Coordination:. All replica take the request in the sequential order.

  • Execution: Every replica executes the request.

  • Agreement: No agreement needed.

  • Response: Each replies to the front.

  • Advantage: achieve sequential consistency, cope with (n/2 – 1) byzantine failures

  • Disadvantage: no more linearizable

Client

Replica

Manger

Front

End

Replica

Manger

Client

Front

End

Replica

Manger

CSS434 Replication


Read any write all protocol l.jpg
Read-Any-Write-All Protocol

  • Read

    • Lock any one of replicas for a read

  • Write

    • Lock all of replicas for a write

  • Sequential consistency

  • Intolerable for even 1 failing replica upon a write.

Read from any one of them

Client

Replica

Manger

Front

End

Replica

Manger

Write to all of them

Client

Front

End

Replica

Manger

Replica

Manger

CSS434 Replication


Available copies protocol l.jpg

X

Available-Copies Protocol

  • Read

    • Lock any one of replicas for a read

  • Write

    • Lock all available replicas for a write

  • Recovering replica

    • Bring itself up to date by coping from other servers before accepting any user request.

  • Better availability

  • Cannot cope with network partition. (Inconsistency in two sub-divided network groups)

Read from any one of them

Client

Replica

Manger

Front

End

Write to all available replicats

Replica

Manger

Client

Front

End

Replica

Manger

Replica

Manger

CSS434 Replication


Available copies protocol example 1 gossip l.jpg
Available Copies ProtocolExample 1: Gossip

If (Tj > Tk)

update RMk

else

discard the gossip message

Categorized in lazy available

copies protocol

Tardy messages are ignored

RMk

Gossip

RMj

(Tj)

RMi

(Ti)

Update, Tf

Update id

Query, Tf

Value, Ti

If (Tf > Tj)

update RMj

else {

update Client

or

ignore and update RMj}

If (Tf < Ti)

return value

else {

waits for RMi to be updated

or

query RMj/RMk}

FE

(Tf)

FE

Query

Value

Update

Client

Client

CSS434 Replication


Slide20 l.jpg

Primary

RM

RM

Sent first

Sent later

FE

FE

FE

FE

Tn

T0

T3

T1

Client

Client

Client

Client

Executive: book 3pm

Secretary and other employees:

book 3pm

Available Copies ProtocolExample 2: Bayou

Committed

Tentative

  • To make a tentative update committed:

    • Perform a dependency check

      • Check conflicts

      • Check priority

    • Merge Procedure

      • Cancel tentative updates

      • Change tentative updates

Categorized in lazy available

copies protocol

Tardy messages are reordered

or merged.

Tn

Tn+1

C0

C1

C2

T0

T1

T2

T3

CN

CSS434 Replication


Network partitions well known solution quorum based protocols l.jpg

Read quorum

Replica

Manger

Replica

Manger

Replica

Manger

Client

Front

End

Replica

Manger

Replica

Manger

Replica

Manger

Client

Front

End

Replica

Manger

Replica

Manger

Write quorum

Network PartitionsWell-known Solution: Quorum-Based Protocols

  • Read

    • Retrieve the read quorum

    • Select the one with the latest version.

    • Perform a read on it

  • Write

    • Retrieve the write quorum.

    • Find the latest version and increment it.

    • Perform a write on the entire write quorum.

  • If a sufficient number of replicas from read/write quorum, the operation must be aborted.

#replicas in read quorum + #replicas in write quorum > n

Read-any-write-all: r = 1, w = n

CSS434 Replication


Network partitions system example coda l.jpg

  • Normal case:

    • Read-any, write-all protocol

    • Whenever a client writes back its file, it increments the file version at each server.

  • Network disconnection:

    • A client writes back its file to only available servers.

    • Version conflicts are detected and resolved automatically when network is reconnected

  • Client disconnection:

    • A client caches as many files as possible (in hoard walking).

    • A client works in local if disconnected (in emulation mode).

    • A client writes back updated files to servers (in reintegration mode).

W

W

W

Version[3,3,2]

Version[3,3,2]

Version[2,2,3]

emulation

Version[2,2,2]

Version[2,2,2]

Version[2,2,2]

Version[1,1,1]

Version[1,1,1]

Version[1,1,1]

hoard

reintegration

Network PartitionsSystem example: Coda

Server 2

Server 1

Server 3

CSS434 Replication


Paper review by students l.jpg
Paper Review by Students

  • ISIS System

  • Gossip Architecture

  • Bayou System

  • Coda

  • Discussions

    • What if a message is lost in ISIS group communication? What if another crash occurs when unstable/flush messages are exchanged?

    • What performance drawbacks does Gossip have?

    • What problems remain to users in Bayou?

    • Why doesn’t Coda use read/write quorum?

CSS434 Replication


Non turn in exercises l.jpg

Coordinator

Worker 1

Worker 2

INIT

INIT

INIT

Client_wants_to_commit

CanCommit?

CanCommit?

Vote-Yes

CanCommit?

Vote-Yes

WAIT

CanCommit?

Vote-No

CanCommit?

Vote-No

READY

READY

Vote-No

doAbort

Vote-Yes

doCommit

doAbort

Ack

doCommit

Ack

doAbort

Ack

doCommit

Ack

COMMIT

ABORT

COMMIT

ABORT

COMMIT

ABORT

Non-Turn-In Exercises

  • The following state transition diagram describes the two-phase commitment protocol. Let’s assume that worker1 crashed when a coordinate sent a commit message. Trace this diagram. To be specific, make appropriate dashed arrows “thick and solid arrows” with your pen or pencil.

CSS434 Replication


Non turn in exercises25 l.jpg
Non-Turn-In Exercises

  • Textbook p762, Q17.1: In a decentralized variant of the two-phase commit protocol, the participants communicate directly with one another instead of indirectly via the coordinator. In phase 1, the coordinator sends its vote to all the participants. In phase 2, if the coordinator’s vote is No, the participants just abort the transaction; if it is Yes, each participant sends its vote to the coordinator and the other participants, each of which decides on the outcome according to the vote and carries it out. Calculate the number of messages and the number of rounds it takes. What are its advantages or disadvantages in comparison with the centralized variant?

  • Textbook p816, Q18.10: Explain why allowing backups to process read operations directly, (i.e., without contacting a primary), leads to sequentially consistent rather than linearizable executions in a primary-copy replication.

  • Textbook p816, Q18.11: Could the gossip architecture be used for a distributed computer game as describe below?

  • The players move figures around a common scene. The state of the game is replicated at the players’ workstations and at a server, which contains services controlling the game overall, such as collision detection. Updates are multicast to all replicas.

  • The quorum-based replication protocol can address network partition problems. Why didn’t Coda use this protocol? Explain the reason.

  • What if a message is lost in ISIS group communication? Describe a solution.

CSS434 Replication


ad